Letter pubs.acs.org/ac
Race Differentiation by Raman Spectroscopy of a Bloodstain for Forensic Purposes Ewelina Mistek, Lenka Halámková, Kyle C. Doty, Claire K. Muro, and Igor K. Lednev* Department of Chemistry, University at Albany, State University of New York, 1400 Washington Avenue, Albany, New York 12222, United States ABSTRACT: Bearing in mind forensic purposes, a nondestructive and rapid method was developed for race differentiation of peripheral blood donors. Blood is an extremely valuable form of evidence in forensic investigations so proper analysis is critical. Because potentially miniscule amounts of blood traces can be found at a crime scene, having a method that is nondestructive, and provides a substantial amount of information about the sample, is ideal. In this study Raman spectroscopy was applied with advanced statistical analysis to discriminate between Caucasian (CA) and African American (AA) donors based on dried peripheral blood traces. Spectra were collected from 20 donors varying in gender and age. Support vector machines-discriminant analysis (SVM-DA) was used for differentiation of the two races. An outer loop subject-wise cross-validation (CV) method evaluated the performance of the SVM classifier for each individual donor from the training data set. The performance of SVM-DA, evaluated by the area under the curve (AUC) metric, showed 83% probability of correct classification for both races, and a specificity and sensitivity of 80%. This preliminary study shows promise for distinguishing between different races of human blood. The method has great potential for real crime scene investigation, providing rapid and reliable results, with no sample preparation, destruction, or consumption. ody fluids found at a crime scene can be some of the most valuable forms of evidence in forensic investigations. They can provide complex information about a potential suspect or victim. Therefore, a crucial step of forensic casework is the identification of biological traces such as blood, semen, saliva, or sweat.1 Human blood is the most common body fluid found at scenes of violent crimes. Also, the amount of sample available for a forensic investigation could be extremely small. In these instances even more care should be taken to preserve the evidence for further analysis. There are presumptive assays, such as the Kastle-Meyer test, Hemastix, Leucomalachite Green, as well as using luminol or fluorescein,1,2 and confirmatory tests (microcrystal assays) for detecting and identifying of blood.1 Nevertheless, many of these tests require the use of hazardous chemicals, and all consume part of the sample. Furthermore, the current tests can only identify the presence of blood but do not provide investigators with any additional information about the donor. The person’s race can be inferred through cranial and dental analyses3,4 and through DNA analysis.5 Therefore, the application of a nondestructive and rapid method for reliable identification of human blood as well as providing identifiable information, such as race, would be highly advantageous in forensic casework. Raman spectroscopy is a sensitive method for obtaining information about the chemical and biochemical composition of a sample.6 This analytical technique is based on molecular vibrations and requires a change in polarizability. Raman spectroscopy uses monochromatic light to irradiate a sample and inelastically scatter photons, which are collected to generate a spectrum.6 Raman spectroscopy has already been
B
© XXXX American Chemical Society
used for the analysis of various types of forensic evidence including fibers,7 ink,8 paints,9 gunshot residue,10 and bones,11 to name a few. Our lab in particular has published studies on different biological traces including blood, semen, saliva, sweat, vaginal fluid, and body fluid mixtures.12−17 We have also investigated the interference of common substrates with the Raman signal of deposited bloodstains18 and contaminated blood traces.19 A wide study on blood traces was also conducted to understand the heterogeneous chemical composition of blood20 and to distinguish between peripheral and menstrual blood.21 We have also reported on the effect of bloodstain aging on its Raman characteristics,22 successful species differentiation based on Raman23−25 and FT-IR26 blood spectra, and the identification of donors’ gender based on human blood is currently being investigated. Variances in the biochemical composition of blood from donors of different races, genders, and ages have been reported by Koh et al.27 They found a higher concentration of albumin, hemoglobin, hematocrit, serum iron, and serum triglycerides in Caucasian (CA) donors’ blood than in African American (AA) donors’, while AA donors had significantly higher glucose and total protein concentrations. Hemoglobin concentration has been widely studied over the last few decades,27−32 and these investigations have confirmed that there is a higher amount of hemoglobin in the blood of CA subjects than AA subjects. Kramer et al. showed that CA and AA racial groups can be Received: March 24, 2016 Accepted: June 22, 2016
A
DOI: 10.1021/acs.analchem.6b01173 Anal. Chem. XXXX, XXX, XXX−XXX
Letter
Analytical Chemistry
different spots for each sample. The instrument was calibrated using a silicon standard (peak at 520.6 cm−1) before collecting spectra from a bloodstain. Data Treatment and Validation. Data treatment and advanced statistical analysis were performed using MATLAB R2013b (Mathworks, Inc.) and R-project software, ver. 3.1.3. Recorded blood spectra were divided into two data sets based on race. Raman spectra were baseline corrected using the automatic weighted least-squares baseline algorithm, normalized by the standard normal variate method, and mean centered. After these preprocessing steps, further analysis was performed using the PLS Toolbox (Eigenvector Research, Inc.). Informative spectral regions were identified using GA analysis. Multivariate outlier removal was carried out using PCA prior to all statistical analyses, which resulted in the removal of 20 spectra from the 180 total spectra originally collected. To distinguish between blood spectra from CA and AA donors, SVM-DA models were built. The method was validated by a subject-wise outer loop CV where all spectra from one donor were taken out, one at a time, from the training data set and used for validation. The remaining spectra of n − 1 donors were used as training data to build a new SVM-DA model and predictions were performed for the validation data (excluded donor’s spectra). For evaluation purposes, ROC and AUC analyses were applied. ROC analysis was carried out with the open source package pROC.41 The AUC analysis indicated how well the model ranks subjects according to the probability of assignment to the correct class.
distinguished based on the concentration of certain enzymes (creatine kinase and lactate dehydrogenase) in blood serum.33 Differences between races in plasma lipids’ and lipoproteins’ concentrations have also been shown.34 Gaining knowledge from these studies, we have applied the highly selective technique of Raman spectroscopy to detect chemical and biochemical differences in dry blood traces from two different racial groups. It was already reported for different species that even if visual differentiation of Raman blood spectra is impossible advanced statistics allows for classification.23,24,25 Therefore, in this study, an advanced statistical approach was utilized for discrimination processes. This study included the use of genetic algorithm (GA) analysis, which helped to select the spectral regions with the largest diversity between CA and AA peripheral blood donors. GA analysis is a heuristic search algorithm developed to select variables with the lowest prediction error using simulated natural processes necessary for evolution.36 For statistical analysis, principal component analysis (PCA) was used to remove outliers37 and support vector machine-discriminant analysis (SVM-DA) to build classification models. SVM-DA is a supervised machine learning technique that has been widely used in pattern classification problems.21,38 In order to validate the accuracy performance of SVM-DA models built for this study, a subject-wise outer loop cross-validation (CV) was performed. The receiver operating characteristic (ROC) and area under the curve (AUC) analyses are commonly used in diagnostic and screening tests.39 The trapezoidal method of integration was used to estimate AUCs of ROC curves with corresponding 95% confidence intervals (CIs) that have been estimated with the method described by De Long et al.40 The curve in a ROC diagram plots sensitivity (true positive rate) against specificity (true negative rate) for varying thresholds of class prediction probabilities generated, as a way to gauge the prediction efficiency of the SVM-DA models built. Here, we demonstrate a proof-of-concept that Raman spectroscopic analysis of bloodstains is able to successfully differentiate between CA and AA racial groups. Further studies are necessary for examining other factors and conditions, which can potentially affect the biochemical composition and corresponding Raman signature of a bloodstain.
■
RESULTS AND DISCUSSION As previously mentioned, other studies have shown that visual distinction between Raman spectra of blood from different classes is not possible.23,24,35 This is due to the fact that spectra generated by Raman analysis of dried blood, using 785 nm excitation, are composed of peaks originating exclusively from vibrational modes of hemoglobin, which is present in all human blood samples.42 The averaged preprocessed spectrum of all CA and AA donors analyzed in this study is shown in Figure 1A. It was not surprising that Raman spectra for both classes were similar since human blood consists of the same components, with only quantitative variations between them for different races. The number of peaks for both races was equivalent and no spectral shifts were evident (data not shown). However, some slight intensity variations were detected in the regions 250−400 cm−1 and 1230−1268 cm−1, which were also illustrated by the difference spectrum for these two classes (Figure 1B). Additionally, visual differences in peak intensities appeared at 1000 cm−1 (phenylalanine), 1575 cm−1 (proteins), and 1620 cm−1 (heme).21 This slightly higher intensity of heme for CA donors is supported by a previous study which showed higher hemoglobin concentration for the CA race in comparison to the AA race.27 The average difference (Figure 1B, black line) between Raman spectra in CA and AA data sets is smaller than 1 standard deviation between individual spectra in each data set (blue and green lines). This limits the opportunity to use the appearance of individual bands in a Raman spectrum for race identification and indicates the need for advanced statistical analysis using the entire spectral range. GA analysis was carried out on the 160 spectra used to build the SVM-DA models for optimization purposes and to better understand and identify the origin of differences between classes. The analysis considered all possible variables (wave-
■
MATERIALS AND METHODS Blood Samples. A total of 20 human peripheral blood samples were used for this experiment, which were purchased from Bioreclamation, Inc. Donors were chosen with consideration to gender and age diversity. The average age of CA and AA donors was 45.0 ± 8.4 and 43.8 ± 7.2 years, respectively, with male donors making up 40% and 50% of the donor pool, respectively. All blood samples were kept frozen until sample preparation. After defrosting, tubes of blood were vortexed and 10 μL of blood were deposited onto an aluminum foil covered microscope slide. Prepared samples were allowed to dry overnight prior to spectral collection. Instrumentation and Spectral Collection. A Renishaw inVia Raman spectrometer was used for sample analysis. The instrument was equipped with a Leica optical microscope with a 20× objective and PRIOR automatic stage. A 785 nm laser light (power = 4.0 mW) was used for excitation; 20 10-s accumulations were recorded from each spot on the sample. Spectra were recorded in the range of 250−1800 cm−1. A total of 180 spectra were collected using Raman mapping with nine B
DOI: 10.1021/acs.analchem.6b01173 Anal. Chem. XXXX, XXX, XXX−XXX
Letter
Analytical Chemistry
Figure 1. (A) Baseline corrected and normalized mean Raman spectrum of all blood samples from the training data set with red highlighted regions showing the most significant areas for distinction between classes in data set based on GA analysis. (B) Difference mean spectrum (black line) and the standard deviation (SD) of mean blood spectra for Raman data sets of Caucasian (blue lines) and African American (green lines) donors. (C) The ROC curves for the SVM classifiers for classification of Caucasian and African American races based on probabilities for each spectrum (upper part) and for each subject (lower part). Area under the curve (AUC) values give the efficacies of the SVM classifiers and gives the probability that the race will be classified accurately as Caucasian or African American according to Raman spectra, which is 71% based on a single spectrum and 83% based on a single donor.
out CV at the donor level. For additional information, see ref 43. All spectra from one subject at a time were excluded from the initial training set and used as the validation set to test the model built using spectra from the remaining (n − 1) donors. This process was repeated until all subjects were separately used for validation. For each donor, the final classification results were calculated as prediction probabilities that each spectrum will be correctly classified and also that each subject belongs to the correct class based on the classification of all donors’ spectra. For each donor, the final classification results were calculated as prediction probabilities that each spectrum, or each subject as a whole, belongs to the correct class. Among the subsets from all 20 subjects, the predicted group membership and probabilities, for each spectrum and for each subject, were recorded. Using ROC analysis, the best thresholds were identified (above which the spectrum/donor probability estimate was assigned to the correct class) to rank the SVM classifier’s ability to separate the races. The results of the AUC analysis can range from 0 to 1. An AUC value of 0.5 represents a random classifier and an AUC value of 1.0 indicates a perfect test. This analysis allowed for discrimination of CA and AA races with an AUC value of 0.71 (95% CI, 0.63−0.79) based on a single spectrum, and 0.83 (95% CI, 0.64−1.00) based on each subject (Figure 1C). These values represent the probability that the classifier can correctly distinguish between the CA and AA blood samples. The discriminatory power of the SVM-DA model was lower for a single spectrum as compared to the subject-wise results. This can be explained due to the fact that not all spectra have noticeable contributions from biomarkers with high discriminatory power. This preliminary study shows promise for race differentiation based on human blood traces analyzed by Raman spectroscopy. Further investigation is warranted based on these results, specifically for testing blood of other races.
numbers) within the Raman spectral data set and their significance for the discrimination between classes (races). This allowed for the reduction of the original Raman spectra to subsets of unique wavenumbers in order to achieve better prediction performance.36 The GA analysis only selected variables that gave the most valuable information for discrimination within the entire training data set of donors from both races. The spectral regions selected by the GA operation are shown in Figure 1A. The two regions 281−318 cm−1 and 1231−1268 cm−1 (selected by GA analysis) are included in those that were observed to vary in intensity by visual comparison as shown in the difference spectrum (Figure 1B). An SVM-DA classification model was built based on 160 spectra from 20 donors (10 for each race). The model was used to differentiate races based on the spectral features, selected by GA analysis from the original Raman spectra. The SVM-DA model was automatically trained with a data set of labeled spectra and by tuning parameters via modification of the underlying kernel function. For this study, pattern recognition SVM-DA was used with the radial basis function as a kernel function, and it was optimized by a combined approach of 5fold CV and a systematic grid search of the parameters. The internal CV executed by the model showed 71% accuracy (data not shown). The prediction performance of the subsequently built SVM-DA models was estimated by subject-wise leave-one-
CONCLUSIONS For the first time, Raman spectroscopy, combined with chemometrics, has been used to differentiate between dry blood traces from CA and AA donors. To validate the internal CV results, which achieved 71% correct classification of donors based on all spectra included in a training data set, outer CV was performed. The summary of predictions from the subject wise outer loop CV for 20 different SVM-DA models demonstrated 83% (AUC) probability of correct race classification of individual donors after ROC analysis. These results show promise for discrimination of the race of human peripheral blood found at a crime scene. Since blood composition quantitatively varies for different races, these changes for the two races considered here may be detected by Raman spectroscopy. More importantly, chemometrics was applied to support and strengthen the classification. This approach allowed for nondestructive detection of minor differences that were present in blood spectra between two races (CA and AA). By using Raman spectroscopy for the method of analysis, the bloodstain’s integrity is preserved, and it can be further examined or used for subsequent tests (e.g., DNA profiling) with no change to the sample. Therefore, this technique could extract information about an unknown blood sample without damaging or consuming it, unlike most tests currently used for blood identification and/or analysis in forensic casework. The application of Raman spectroscopy in real crime scene
■
C
DOI: 10.1021/acs.analchem.6b01173 Anal. Chem. XXXX, XXX, XXX−XXX
Letter
Analytical Chemistry
(22) Doty, K. C.; McLaughlin, G.; Lednev, I. K. Anal. Bioanal. Chem. 2016, 408, 3993−4001. (23) Virkler, K.; Lednev, I. K. Anal. Chem. 2009, 81, 7773−7777. (24) McLaughlin, G.; Doty, K. C.; Lednev, I. K. Forensic Sci. Int. 2014, 238, 91−95. (25) McLaughlin, G.; Doty, K. C.; Lednev, I. K. Anal. Chem. 2014, 86, 11628−11633. (26) Mistek, E.; Lednev, I. K. Anal. Bioanal. Chem. 2015, 407, 7435− 7442. (27) Koh, E. T.; Chi, M. S.; Lowenstein, F. W. Am. J. Clin. Nutr. 1980, 33, 1828−1835. (28) Garn, S. M.; Smith, N. J.; Clark, D. C. J. Natl. Med. Assoc. 1975, 67, 91−96. (29) Johnson, C. L.; Abraham, S. Advancedata From Vital and Health Statistics, The National Center for Health Statistics, U.S. Department of Health, Education, and Welfare, Public Health Service, Office of Health Research, Statistics, and Technology, 1979, No. 46, pp 1−12. (30) Meyers, L. D.; Habicht, J. P.; Johnson, C. L. Am. J. Epidemiol. 1979, 109, 539−549. (31) Reeves, J. D.; Driggers, D. A.; Lo, E. Y.; Dallman, P. R. Am. J. Clin. Nutr. 1981, 34, 2154−2157. (32) Garn, S. M.; Smith, N. J.; Clark, D. C. Am. J. Clin. Nutr. 1975, 28, 563−568. (33) Kramer, F.; Halámková, L.; Poghossian, A.; Schöning, M. J.; Katz, E.; Halámek, J. Analyst 2013, 138, 6251−6257. (34) Morrison, J. A.; deGroot, I.; Kelly, K. A.; Mellies, M. J.; Khoury, P.; Edwards, B. K.; Lewis, D.; Lewis, A.; Fiorelli, M.; Heiss, G.; Tyroler, H. A.; Glueck, C. J. Prev. Med. 1979, 8, 34−39. (35) De Wael, K.; Lepot, L.; Gason, F.; Gilbert, B. Forensic Sci. Int. 2008, 180, 37−42. (36) Niazi, A.; Leardi, R. J. Chemom. 2012, 26, 345−351. (37) Pascoal, C.; Oliveira, M. R.; Pacheco, A.; Valadas, R. In Combining Soft Computing and Statistical Methods in Data Analysis; Borgelt, C., Rodríguez, G. G., Trutschnig, W., Lubiano, M. A., Gil, M. A., Grzegorzewski, P., Hryniewicz, O., Eds.; Springer: Berlin Heidelberg, Germany, 2010; Vol. 77, pp 499−507. (38) Marcelo, M. C. A.; Mariotti, K. C.; Ferrão, M. F.; Ortiz, R. S. Forensic Sci. Int. 2015, 246, 65−71. (39) Hajian-Tilaki, K. Caspian J. Int. Med. 2013, 4, 627−635. (40) DeLong, E. R.; DeLong, D. M.; Clarke-Pearson, D. L. Biometrics 1988, 44, 837−845. (41) Robin, X.; Turck, N.; Hainard, A.; Tiberti, N.; Lisacek, F.; Sanchez, J. C.; Müller, M. BMC Bioinf. 2011, 12, 77−84. (42) Premasiri, W. R.; Lee, J. C.; Ziegler, L. D. J. Phys. Chem. B 2012, 116, 9376−9386. (43) Varma, S.; Simon, R. BMC Bioinf. 2006, 7, 91.
investigations is highly probable due to commercially available portable instruments, which allow for nondestructive and rapid examination at the scene of a crime. Furthermore, not only can a stain be identified as blood using our technology but, by incorporating statistical analysis, more information about the donor can be obtained, all in a reliable and statistically confident manner.
■
AUTHOR INFORMATION
Corresponding Author
*E-mail: ilednev@albany.edu. Author Contributions
The manuscript was written through contributions of all authors. Notes
The authors declare no competing financial interest.
■
ACKNOWLEDGMENTS This project was supported by Awards No. 2011-DN-BX-K551 and 2014-DN-BX-K016 awarded by the National Institute of Justice, Office of Justice Programs, U.S. Department of Justice (I.K.L.). The opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect those of the U.S. Department of Justice.
■
REFERENCES
(1) Kobilinsky, L. F. In Forensic Chemistry Handbook; John Wiley & Sons: Hoboken, NJ, 2012; pp 269−282. (2) Johnston, E.; Ames, C. E.; Dagnall, K. E.; Foster, J.; Daniel, B. E. J. Forensic Sci. 2008, 53, 687−689. (3) Rosas, A.; Bastir, M.; Alarcon, J. A.; Kuroe, K. Arch. Oral Biol. 2008, 53, 826−834. (4) Blumenfeld, J. Totem: The University of Western Ontario Journal of Anthropology 2000, 8, 20−23. (5) Elkins, K. M. Forensic DNA Biology: A Laboratory Manual, 1st ed.; Academic Press: Oxford, U.K., 2012. (6) Skoog, D. A.; Holler, F. J.; Nieman, T. A. In Principles of Instrumental Analysis, 5th ed.; Saunders College Publishing: Orlando, FL, 1998; pp 429−444. (7) Miller, J. V.; Bartick, E. G. Appl. Spectrosc. 2001, 55, 1729−1732. (8) Zięba-Palus, J.; Kunicki, M. Forensic Sci. Int. 2006, 158, 164−172. (9) Zięba-Palus, J.; Borusiewicz, R. J. Mol. Struct. 2006, 792−793, 286−292. (10) Bueno, J.; Sikirzhytski, V.; Lednev, I. K. Anal. Chem. 2012, 84, 4334−4339. (11) McLaughlin, G.; Lednev, I. K. Am. J. Anal. Chem. 2012, 3, 161− 167. (12) Virkler, K.; Lednev, I. K. Forensic Sci. Int. 2008, 181, e1−e5. (13) Virkler, K.; Lednev, I. K. Forensic Sci. Int. 2009, 193, 56−62. (14) Virkler, K.; Lednev, I. K. Analyst 2010, 135, 512−517. (15) Sikirzhytskaya, A.; Sikirzhytski, V.; Lednev, I. K. Forensic Sci. Int. 2012, 216, 44−48. (16) Sikirzhytski, V.; Sikirzhytskaya, A.; Lednev, I. K. Forensic Sci. Int. 2012, 222, 259−265. (17) Sikirzhytski, V.; Virkler, K.; Lednev, I. K. Sensors 2010, 10, 2869−2884. (18) McLaughlin, G.; Sikirzhytski, V.; Lednev, I. K. Forensic Sci. Int. 2013, 231, 157−166. (19) Sikirzhytskaya, A.; Sikirzhytski, V.; McLaughlin, G.; Lednev, I. K. J. Forensic Sci. 2013, 58, 1141−1148. (20) Virkler, K.; Lednev, I. K. Anal. Bioanal. Chem. 2010, 396, 525− 534. (21) Sikirzhytskaya, A.; Sikirzhytski, V.; Lednev, I. K. J. Biophotonics 2014, 7, 59−67. D
DOI: 10.1021/acs.analchem.6b01173 Anal. Chem. XXXX, XXX, XXX−XXX