Subscriber access provided by UB + Fachbibliothek Chemie | (FU-Bibliothekssystem)
Letter
Race differentiation based on Raman spectroscopy of semen traces for forensic purposes Claire K. Muro, and Igor K. Lednev Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.7b00106 • Publication Date (Web): 30 Mar 2017 Downloaded from http://pubs.acs.org on March 31, 2017
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
Race differentiation based on Raman spectroscopy of semen traces for forensic purposes Claire K. Muro and Igor K. Lednev* Chemistry Department, University at Albany, 1400 Washington Avenue, Albany, NY 12222 USA * Email:
[email protected], Phone: 518-591-8863, Fax: 518-442-3462 ABSTRACT: Several novel methods to determine externally visible characteristics of body fluid donors have been developed in recent years. These tests can help forensic investigators make predictions about the appearance of a suspect or victim, such as their sex, race, hair color, or age. While their potential benefit is undeniable, these methods destroy the physical evidence in the process. Raman spectroscopy has recently been used as a nondestructive technique to test for many of these characteristics. Here we present the results from a study to determine the race of semen donors. Using Raman spectroscopy and multivariate data analysis, we were able to build a statistical model that accurately identified the race of all 18 semen donors in the calibration dataset, as well as seven additional external validation donors. These results demonstrate Raman spectroscopy’s potential to differentiate Caucasian and Black semen donors using chemometrics.
Key words: Raman Spectroscopy, forensics, chemometrics, body fluids In 2009, the National Research Council released a statement summarizing the current state of forensic science in the U.S., and highlighting specific areas in need of improvement. 1 More recently, the President’s Council of Advisors on Science and Technology published a report outlining the gaps that still exist in forensic practice, focusing on feature-comparison methods.2 Both of these reports stress the importance of validated and objective methods in forensic science, and criticized the use of subjective methods that are prone to user-bias. Based on these considerations, analytical chemistry techniques are attractive options for forensic laboratories. Analytical chemistry provides a wealth of opportunities to forensic investigators. Its quantitative nature ensures that results are objective, and are often accompanied by confidence intervals. Consequently, many new techniques have been developed in research labs. Several of these techniques focus on the analysis and characterization of body fluids. 3,4 Body fluids are unique in that they can be found at a variety of crime scenes, but their complex biochemical composition can make them arduous for analysts.5 One of the most common forensic techniques applied to body fluids is DNA profiling. A complete profile (13 loci) obtained from a sample of evidence can be compared to a database to either include or exclude potential donors. However, if there is no matching profile in the database, an unknown DNA profile is of little use to detectives. In these instances, it would be incredibly helpful if investigators could acquire a “phenotype profile” from the evidence in question. Such a profile could predict the donor’s sex, race, age, and other characteristics. These descriptors, unlike the list of alleles that
come with a genotype profile, would allow investigators to narrow their search for a suspect or victim. In recent years, researchers have developed methods to generate these “phenotype profiles” through DNA analysis, biocatalytic cascades, and immunoassays. DNA can be comprehensively analyzed to predict sex, race, eye, and hair color, all with a single microchip.6 Donor sex and race can be predicted from blood traces by biocatalytic cascades.7,8 Additionally, biocatalytic cascades can be used to determine the age of a donor and time since deposition of a blood stain.9 The same research group was able to use the sweat left behind in fingerprints to determine donor sex.10 While these developments show great promise, they all involve sample consumption, meaning they are destructive to the evidence. Developing a nondestructive technique would be ideal in forensics, so that evidence can be preserved for future analyses. Our lab has been working on a new spectroscopic technique that could be utilized at crime scenes to derive information from body fluids. By combining Raman spectroscopy with multivariate data analysis, we have developed multidimensional spectroscopic signatures for blood, saliva, semen, sweat, and vaginal fluid.11-15 We have also developed a statistical model to differentiate and identify these body fluids 5, and determined its limit of detection for peripheral blood to be a single red blood cell.16 After identifying blood, we are able to differentiate between samples of blood from different animal species 17-19 as well as peripheral and menstrual blood.20 Our most recent studies have shown we are able to predict time since deposition21 and race22 from traces of blood. Sex determination has also been achieved for saliva donors.23 Here, we present a proof-of-concept study showing that donor race can be predicted based on Raman spectroscopy and multivariate data analysis of dried semen traces. Twenty-five semen samples were mapped with a Raman microscope. Their
ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Raman spectra were used to train and test statistical models. The internally cross-validated model presented here accurately identified the race of 100% of the donors in the calibration dataset, and 100% of the donors in the external validation dataset. This study is the first successful attempt to differentiate between Black and Caucasian semen donors using a spectroscopic technique. It complements similar studies with other body fluids, and emphasizes the value of Raman spectroscopy in forensics.
EXPERIMENTAL SECTION Sample analysis. Semen samples were purchased from Bioreclamation IVT, Inc. (Westbury, NY). Twenty-five samples (from 12 Black and 13 Caucasian donors) were used for this proof-of-concept study. Each dried trace was prepared by depositing 10 μL of liquid semen onto an aluminum foil covered microscope slide, allowing it to dry overnight, and then analyzing the dried sample with Raman microspectroscopy. A Renishaw inVia Raman spectrometer (Hoffman Estates, IL) equipped with a Leica microscope (Buffalo Grove, IL) and PRIOR automatic mapping stage (Rockland, MA) was used for sample analysis. Samples were excited with a 785 nm wavelength laser operating at 50% power (65 mw), focused with a 50X objective (3 μm2 laser spot size). Samples were mapped to collect 64 spectra from the area of 1400 μm by 1400 μm in the range of 300-1800 cm-1, with each spectrum acquired via seven 10 s accumulations. Data analysis. The experimental spectra were imported into the MATLAB version 2012b workspace (MathWorks, Inc., Natick, MA). Datasets were created with the PLS Toolbox extension (Eigenvector Research, Inc., Wenatchee, WA). The spectra were baseline corrected with an adaptive iteratively reweighted penalized least squares (air-PLS) algorithm.24 The spectra were then truncated to 600-1750 cm-1 to remove any artifacts from baseline correction. Seven of the donors were randomly chosen for external validation. The remaining 18 donors were used for variable selection and model calibration. A genetic algorithm (GA) was used to select the most informative features in the calibration dataset. The population size was set to 64, with the maximum number of generations for breeding set to 100. The breeding was set to double crossover, and the default mutation rate (0.005) was used. The window width was 50 variables, and 30% of the windows were initially included. The final GA calibration dataset was used to train a support vector machine discriminant analysis (SVMDA) model. The model preprocessed the data further by smoothing by Savitsky-Golay (order: 3, window: 15 pt), normalizing by total area, and mean centering. The data was then compressed by partial least squares discriminant analysis (PLSDA) with 3 components. The model was internally cross-validated by Venetian blinds with 10 splits. After successfully training and cross-validating the SVMDA model, it was externally validated with the spectra from the seven donors set aside at the beginning.
RESULTS AND DISCUSSION
Page 2 of 6
Raman Spectra of Semen. Semen is an organic fluid containing two main fractions: cellular and noncellular portions.25 The cellular portion contains the sperm cells, and occasionally leucocytes and epithelial cells. The noncellular portion, often referred to as seminal plasma, provides a source of nutrition, energy, and protection for the sperm cells.26 Seminal plasma contains fructose, choline, spermine phosphate hexahydrate (SPH), in addition to a host of other enzymes, amino acids, and proteins. The preprocessed mean Raman spectra from the two classes are shown in Figure 1. The spectral profiles of semen contain peaks associated with several different biochemical components. . The first strong peak, at 715 cm-1, has been assigned to the CN stretching mode of choline (Table 1).13 The sharp peak at 830 cm-1 results from the COC symmetric stretching mode of fructose as well as tyrosine’s ring breathing mode. 13,27 Tyrosine also contributes to the peaks at 851 cm-1 (CC aliphatic stretching), 1179 cm-1 (CH2/NH3 rocking), 1200 cm-1 (CC stretching), and 1327 cm-1 (ring stretching).13,27 The spermine phosphate hexahydrate (SPH) found in semen has been attributed to the peaks at 888 cm-1 (phosphate mode) and 958 cm-1 (phosphate symmetric stretching).13 Phenylalanine’s characteristic peak can be seen at 1003 cm-1.13,27 Lastly, the broad peaks at 1416 and 1448 cm-1 have been credited to the CH2 scissoring modes of lipids and the CH2CH3 bending mode of tryptophan.13,26
Figure 1. Preprocessed (baseline corrected and normalized) mean spectra from the Black (blue) and Caucasian (red) semen donors. Regions selected by the GA are in bold; regions discarded by the GA are faded.
Figure 1 shows the mean spectra for the Black and Caucasian classes. There are some regions of the spectra that appear to show differences between the two classes, such as the peaks at 715, 851, 888, 1416, and 1448 cm-1. However, these differences are actually quite small. Figure 2 shows the mean difference spectrum for the two classes, as well as each class’s standard deviation. Here you can see that any observable peaks in the difference spectrum are significantly smaller than the variation observed within each class. This is not unexpected, as it has been shown in the literature that semen is a very heterogeneous sample and can result in a vast array Raman spectra.13 Because there is so much intra-class heteroge-
ACS Paragon Plus Environment
Page 3 of 6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
neity relative to inter-class differences, it is impossible to discriminate between the spectra from the two groups visually. Instead, we need to use multivariate data analysis to probe for and elucidate complex relationships in the spectra. Table 1. Major peaks observed in the Raman spectra of semen.
Raman Peak (cm-1)
Component
Mode
715 830
Choline Tyrosine
851 888 958
Tyrosine SPH SPH
1003 1179 1200 1327 1418 1448
Phenylalanine Tyrosine Tyrosine Tyrosine Lipids Tryptophan
CN stretching13 COC symmetric stretching27 Ring breathing13 CC aliphatic stretching27 Phosphate mode13 Phosphate symmetric stretching13 Ring breathing13,27 CH2/NH3 rocking13 CC stretching13 Ring stretching13 CH2 scissoring26 CH2CH3 bending13,26
the GA selected for analysis. The spectral regions that have been faded were determined to be uninformative by the GA. The SVMDA model predicts which class each individual spectrum belongs to. In essence, it is reporting race differentiation on the spectral level. Since each sample was analyzed by Raman mapping, each sample is represented by multiple spectra in the final datasets. These spectra all vary to some extent from one to the next due to semen’s heterogeneity. One can expect that the majority of semen components and the corresponding regions of the heterogeneous sample could be the same for both races. Consequently, Raman spectra measured from these regions could be assigned to any race class and, as such, could be misclassified. In other words, it is very likely that some of the spectra from a Caucasian donor’s semen sample will be substantially similar to spectra collected from a Black donor’s semen. Furthermore, because each spectrum is classified independently, it is possible that a donor could have some spectra assigned to one class, while others are assigned to a different class. These spectral level results can be evaluated with a receiver operating characteristic (ROC) curve, which can also determine the best threshold for donor level classifications. A ROC curve plots the performance of a binary classifier, like the SVMDA model built here. Each point in the ROC curve represents a potential discrimination threshold, and the data points are plotted as a function of their corresponding false negative and true positive rates. The ideal threshold would be plotted at (0, 1) to represent a 0% false positive rate and 100% true positive rate. The ROC curve for the SVMDA model built on the GA spectra is shown in Figure 3. Noted in Figure 3 is a data point plotted at (0, 1), corresponding to a 60% threshold. When this threshold was applied to the crossvalidated spectral level predictions produced by the SVMDA model, all 18 of the donors were classified correctly.
Figure 2. Summary spectra from Black and Caucasian semen donors showing the difference between the mean spectra (black), and the positive and negative standard deviations of the Black (blue) and Caucasian (red) donors’ spectral datasets.
Model Calibration. Chemometrics was employed to differentiate the experimental semen spectra according to donor race. After preprocessing, the spectra were split so that 18 donors were used for model calibration while the remaining seven donors were saved for external validation. The calibration spectra were run though a GA to identify the most informative variables in the dataset. In other words, the GA looks for which regions in the spectra aid in the differentiation. These variables were selected and preserved in the dataset for further analysis, while the rest were discarded. Figure 1 shows the results from GA. The spectral regions shown in bold represent those variables that
Figure 3. Receiver Operated Characteristic (ROC) curve for the internally cross-validated SVMDA model trained to differentiate semen donors according to race. The true positive rate (sensitivity) of each potential discrimination threshold is plotted as a function of its corresponding false positive rate (1 – specificity).
ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
External Validation. The Raman spectra from the seven external validation donors were loaded into the cross-validated SVMDA model for identification. The previously established threshold of 60% was applied to the model’s spectral level predictions in order to classify them at the donor level. Figure 4 plots the spectral level results for external donors. As can be seen in the figure, all four Caucasian donors exceed the 60% threshold, while all three Black donors are well below it. The final results show that all seven donors were classified correctly. This indicates that both the model and ROC threshold are robust enough to be successfully applied to new, unknown, spectra, and they are not overfit to the calibration spectra.
Figure 4. Histogram plotting the results from external validation of the SVMDA model. The percent of spectra classified as Caucasian is plotted as the bar height for each donor. The 60% threshold established by ROC analysis during calibration is plotted as a dashed line.
Page 4 of 6
was split, so that 18 donors were used for model calibration while the remaining seven were set aside for external validation. The 18 calibration donors were first submitted to genetic algorithm (GA), which selected the most informative variables in the experimental spectra. This filtered dataset was then used to train a support vector machine discriminant analysis (SVMDA) model. The model’s strict donor predictions were used to perform receiver operating characteristic (ROC) analysis and determine a classification threshold. The final results revealed that all 18 calibration donors were classified correctly. The Raman spectra from the remaining seven donors were submitted to the pre-calibrated SVMDA model and established threshold, and all seven were assigned to the correct race. Based on the spectral regions selected by the GA, it appears that the source of differentiation may be the relative concentrations of choline, tyrosine, and spermine phosphate hexahydrate. This study demonstrates that Raman spectroscopy could be an invaluable tool for forensic investigators. Further research is underway in our lab to expand on this preliminary study. Future studies will include other racial groups beyond the two explored here and enlarge the donor population to increase the variability between samples. Potential limitations of this technique could include unanticipated effects from donor age or medical conditions, such as hypercholesterolemia or azoospermia. The final chemometric model will be added to our current suite of models, which as a whole will revolutionize forensic body fluid analysis. Once complete, our models will be able to use spectra collected at a crime scene to (1) identify body fluids, (2) differentiate traces from human and nonhuman animals, (3) determine the time since deposition (TSD), and (4) provide phenotypic information about the donor, such as their race, sex, and age. We have already published on all four of these objectives, and our future work will continue to enhance and refine the method.
AUTHOR INFORMATION Corresponding Author
CONCLUSIONS
*
DNA profiling is one of the two techniques available to forensic scientists to identify individuals based on physical evidence. In the absence of finding a match in a DNA database, the original profile can still be used to elucidate information about the donor, such as their sex or race. While this information is inarguably helpful to the investigation, the methodology is time consuming, destructive, and must be performed in a laboratory setting. Conversely, Raman spectroscopy is nondestructive, acquisition times are usually less than a minute, and portable instruments can be used to collect spectra right at the scene. This study demonstrates for the first time that Raman spectroscopy can be used to determine the race of a semen donor. Raman spectroscopy probes the biochemical composition of a sample. Previous studies have shown that there are differences in the biochemical composition of peripheral blood from Caucasian and Black donors. Specifically, the concentrations of creatine kinase and lactate dehydrogenase are higher in the peripheral blood of Black donors.7 Currently, there is no information in the literature comparing the biochemical composition of semen samples from different races. For this study, semen samples from 25 Caucasian and Black donors were analyzed by Raman spectroscopy. The sample set
Email:
[email protected], Phone: 518-591-8863, Fax: 518442-3462.
Author Contributions The manuscript was written through contributions of all authors.
Notes The authors declare no competing financial interests.
ACKNOWLEDGMENTS This project was supported by Awards No. 2014-DN-BX-K016 and 2015-R2-CX-0019, awarded by the National Institute of Justice, Office of Justice Programs, U.S. Department of Justice. The opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect those of the Department of Justice.
REFERENCES (1) Edwards, H.; Gotsonis, C., Strengthening forensic science in the United States: A path forward; Statement before the United State Senate Committee on the Judiciary; National Research Council: Washington, D.C., 2009. (2) Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods; President’s Council of Advisors on Science and Technology2016.
ACS Paragon Plus Environment
Page 5 of 6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
(3) Virkler, K.; Lednev, I. K. Forensic Sci Int 2009, 188, 1-17. (4) Zou, Y.; Xia, P.; Yang, F.; Cao, F.; Ma, K.; Mi, Z.; Huang, X.; Cai, N.; Jiang, B.; Zhao, X. Anal Methods-Uk 2016, 8, 3763-3767. (5) Muro, C. K.; Doty, K. C.; de Souza Fernandes, L.; Lednev, I. K. Forensic Chemistry 2016, 1, 31-38. (6) Keating, B.; Bansal, A. T.; Walsh, S.; Millman, J.; Newman, J.; Kidd, K.; Budowle, B.; Eisenberg, A.; Donfack, J.; Gasparini, P. International Journal of Legal Medicine 2013, 127, 559-572. (7) Kramer, F.; Halámková, L.; Poghossian, A.; Schöning, M. J.; Katz, E.; Halámek, J. Analyst 2013, 138, 6251-6257. (8) Bakshi, S.; Halámková, L.; Halámek, J.; Katz, E. Analyst 2014, 139, 559-563. (9) Agudelo, J.; Halámková, L.; Brunelle, E.; Rodrigues, R.; Huynh, C.; Halámek, J. Anal Chem 2016. (10) Huynh, C.; Brunelle, E.; Halámková, L.; Agudelo, J.; Halámek, J. Anal Chem 2015, 87, 11531-11536. (11) Virkler, K.; Lednev, I. K. Anal Bioanal Chem 2010, 396, 525534. (12) Virkler, K.; Lednev, I. K. Analyst 2010, 135, 512-517. (13) Virkler, K.; Lednev, I. K. Forensic Sci Int 2009, 193, 56-62. (14) Sikirzhytski, V.; Sikirzhytskaya, A.; Lednev, I. K. Anal Chim Acta 2012, 718, 78-83. (15) Sikirzhytskaya, A.; Sikirzhytski, V.; Lednev, I. K. Forensic Sci Int 2012, 216, 44-48. (16) Muro, C. K.; Lednev, I. K. Anal Bioanal Chem 2017, 409, 287-293. (17) Virkler, K.; Lednev, I. K. Anal Chem 2009, 81, 7773-7777. (18) McLaughlin, G.; Doty, K. C.; Lednev, I. K. Forensic Sci Int 2014, 238, 91-95. (19) McLaughlin, G.; Doty, K. C.; Lednev, I. K. Anal Chem 2014, 86, 11628-11633. (20) Sikirzhytskaya, A.; Sikirzhytski, V.; Lednev, I. K. J Biophotonics 2014, 7, 59-67. (21) Doty, K. C.; McLaughlin, G.; Lednev, I. K. Anal Bioanal Chem 2016, 408, 3993-4001. (22) Mistek, E.; Halámková, L.; Doty, K. C.; Muro, C. K.; Lednev, I. K. Anal Chem 2016, 88, 7453-7456. (23) Muro, C. K.; de Souza Fernandes, L.; Lednev, I. K. Anal Chem 2016, 88, 12489-12493. (24) Zhang, Z.-M.; Chen, S.; Liang, Y.-Z. Analyst 2010, 135, 1138-1146. (25) Barčot, O.; Balarin, M.; Gamulin, O.; Ježek, D.; Romac, P.; Brnjas-Kraljević, J. Appl Spectrosc 2007, 61, 309-313. (26) Huang, Z.; Chen, X.; Chen, Y.; Chen, J.; Dou, M.; Feng, S.; Zeng, H.; Chen, R. Journal of Biomedical Optics 2011, 16, 1105011105013. (27) Virkler, K.; Lednev, I. K. Forensic Sci Int 2008, 181, E1-E5.
ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
For TOC Only:
ACS Paragon Plus Environment
Page 6 of 6