Identification of Pulmonary Edema in Forensic ... - ACS Publications

and RF classification models, and the test set was used to evaluate the performance of the models. The preprocessing of the row spectral data set was ...
1 downloads 7 Views 1MB Size
Subscriber access provided by READING UNIV

Article

Identification of pulmonary edema in forensic autopsy cases of sudden cardiac death using Fourier transform infrared microspectroscopy: a pilot study Hancheng Lin, Yiwen Luo, Qiran Sun, Ji Zhang, Ya Tuo, Zhong Zhang, Lei Wang, Kaifei Deng, Yijiu Chen, Ping Huang, and Zhenyuan Wang Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.7b04642 • Publication Date (Web): 24 Jan 2018 Downloaded from http://pubs.acs.org on January 25, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Identification of Pulmonary Edema in Forensic Autopsy Cases of Sudden Cardiac Death Using Fourier Transform Infrared Microspectroscopy: A Pilot Study Hancheng Lin1,2, Yiwen Luo2, Qiran Sun2, Ji Zhang2, Ya Tuo3, Zhong Zhang2, Lei Wang2, Kaifei Deng2, Yijiu Chen2, Ping Huang2*, Zhenyuan Wang1* 1

Department of Forensic Pathology, Xi’an Jiaotong University, Xi’an, 710061, China Shanghai Key Laboratory of Forensic Medicine, Shanghai Forensic Service Platform, Academy of Forensic Science, Shanghai, 200063, China 3 Department of Biochemistry and Physiology, Shanghai University of Medicine and Health Sciences, Shanghai, 201318, China 2

ABSTRACT: Many studies have proven the usefulness of biofluid-based infrared spectroscopy in the clinical domain for diagnosis and monitoring the progression of diseases. Here, we present a state-of-the-art study in the forensic field that employed Fourier transform infrared microspectroscopy for post-mortem diagnosis of sudden cardiac death (SCD) by in situ biochemical investigation of alveolar edema fluid in lung tissue sections. The results of amide-related spectral absorbance analysis demonstrated that the pulmonary edema fluid of the SCD group was richer in protein components than that of the neurologic catastrophe (NC) and lethal multiple injuries (LMI) groups. The complementary results of unsupervised principle component analysis (PCA) and genetic algorithm-guided partial least square discriminant analysis (GA-PLS-DA) further indicated different global spectral band patterns of pulmonary edema fluids between these three groups. Ultimately, a random forest (RF) classification model for post-mortem diagnosis of SCD was built and achieved good sensitivity and specificity scores of 97.3% and 95.5%, respectively. Classification predictions of unknown pulmonary edema fluid collected from 16 cases were also performed by the model, resulting in 100% correct discrimination. This pilot study demonstrates that FTIR microspectroscopy in combination with chemometrics has the potential to be an effective aid for post-mortem diagnosis of SCD.

Sudden cardiac death (SCD) is a major public health problem that has a devastating psycho-social impact on society and the families of victims1,2. According to a population-based study, SCD claims approximately 10 times as many lives as do traffic accidents in the USA and EU combined2. The determination of SCD can provide a key clue for the forensic investigators responsible for addressing sudden death cases3. A better understanding of post-mortem diagnosis of SCD could also help develop preventive strategies such as the use of antiarrhythmic agents or implantable cardioverter-defibrillators in future clinical settings2,4. Currently, the investigation of suspicious SCD cases is typically based on the following steps: evaluation of medical history, systemic forensic autopsy, macroscopic and microscopic cardiac examination, tissue examination, and, lastly, toxicological analyses3,5. A variety of novel methods have been investigated to determine the cause of SCD, such as post-mortem radiological imaging techniques6-9, biochemical marker analysis10-12, and post-mortem genetic testing13-17. However, none of these approaches is well accepted due to the fact that SCD may present in numerous ways, which compounds the burden of postmortem diagnosis. For an example, post-mortem radiological imaging has the advantage of revealing structural heart abnormalities, whereas genetic testing, one molecular autopsy technique, is much more sensitive to functional heart abnor-

malities. The union of multiple methods would benefit postmortem diagnosis of SCD, and the development of new complementary diagnostic methods would also be highly valuable. Pulmonary edema fluid is one of the most common findings during necropsy of SCD cases. However, such a finding is usually regarded as non-specific as to the determination of the cause of death due to its wide existence in forensic pathology. In fact, there are studies demonstrating different biochemical contents within pulmonary edema fluid from different types of etiologies18-20. Furthermore, Wang et al.21-23 employed a realtime qPCR technique to investigate the molecular pathology of post-mortem pulmonary edema associated with various causes of death. The results for different mRNA expressions indicate that different molecular pathologies of alveolar damage arise from different causes of death. All those studies, taken together, imply that the pulmonary edema fluids associated with different etiologies or causes of death could contain a specific pathological biochemical pattern and could be potential specific biofluids for etiology/cause of death diagnosis. Nevertheless, such a hypothesis requires further investigation by other techniques. Fourier transform infrared (FTIR) spectroscopy is an analytical technique that has become increasingly popular for biofluid investigation over the last several years24. The greatest benefit of this technique lies in the high molecular sensitivity,

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

which enables researchers to potentially detect specific spectral biomarkers within cells, tissues and biofluids of various pathological conditions, thus enabling earlier diagnosis of diseases25. There are several studies demonstrating the usefulness of such a technique for diagnosis and monitoring the progression of diseases through the investigation of plasma26-32. Therefore, it is reasonable to expect that the pulmonary edema fluid originating from plasma could have similar biochemical alternations for the pathological states or diseases as the plasma and thus be another specific biofluid for diagnostic purposes by means of FTIR spectroscopy. To our best knowledge, FTIR studies on SCD-related pulmonary edema fluid have not been reported until now. The main goal of this study was to seek a specific spectral fingerprint within pulmonary edema fluid associated with SCD by means of FTIR spectroscopy. Additionally, IR spectra of pulmonary edema fluid from another two representational causes of death—neurologic catastrophe that usually causes sudden death33 and lethal multiple injuries that easily induce hypostatic-pneumonia pulmonary edema—were collected and served as the control groups. Formalin-fixed paraffin-embedded lung tissue specimens were used for the infrared spectroscopy investigation in this study, as such samples are available in large numbers from tissue banks of forensic pathology institutes, are stable in terms of biochemical composition in long-term conservation, and are valuable for retrospective studies. The present work demonstrated for the first time that FTIR microspectroscopy combined with chemometric methods was capable of characterizing the biochemical changes in edema fluid, which could have potential post-mortem diagnostic significance in SCD.

quently collected from the corresponding unstained tissue on the CaF2 slide. Instrument and Spectra Collection. Spectroscopic chemical imaging data acquisition was performed in transmission mode using a Nicolet iN10 MX infrared microscope (Thermo Fisher Scientific, Waltham, MA, USA) coupled with a sensitive liquid nitrogen-cooled 16-element MCT linear array detector. In our study, the spatial and spectral resolutions were set to 25×25 µm2 and 4 cm-1. Absorbance spectra, averaged over 64 scans in the mid-infrared range of 4000 to 900 cm-1, were collected for each pixel. The background spectra were collected on clean blank areas of CaF2 slides each time prior to tissue image acquisition using the same parameters and were automatically subtracted from each sample spectrum. In these conditions, approximately 5 minutes were needed to collect an infrared image containing over 208 spectra and corresponding to a typical area of 400×325 µm2. A total of 98 images were finally recorded. A row spectral data set containing 4118 spectra specifically corresponding to alveolar edema fluid (2117 for SCD, 1451 for NC, and 550 for LMI group) were selected from these 98 images by manual pixel selection (see Figure 1F and G) and then subjected to data preprocessing.

EXPERIMENTAL SECTION Sample Preparation. This study was conducted with the approval of the ethics committee of Xi’an Jiaotong University. Ninety-eight formalin-fixed, paraffin-embedded lung hilum tissue samples from 49 cases (one left and one right lung hilum tissue sample per case) were obtained from the tissue bank of the Department of Forensic Pathology, Xi’an Jiaotong University. The SCD group comprised 24 cases (coronary artery disease, n = 18; right ventricular cardiomyopathy, n = 3; myocarditis, n = 2; dilated cardiomyopathy, n = 1), the neurologic catastrophe (NC) group comprised 17 cases (lethal intracerebral hemorrhage, n = 11; traumatic brain injuries, n = 3; lethal subarachnoid hemorrhage, n = 2; lethal subdural hemorrhage, n = 1), and the lethal multiple injuries (LMI) group, associated with hypostatic pneumonia, comprised 8 cases. The post-mortem diagnosis of all cases was determined by professional forensic pathologists through a forensic autopsy examination (including macromorphological, histological, and toxicological examinations) and consideration of the circumstances of death and medical history. The brief experimental workflow is given in Figure 1. For each lung hilum tissue sample, two adjacent tissue sections were cut using a microtome. One section, 7 µm, was deposited on an infrared transparent calcium fluoride (CaF2) slide, and the other, 2.5 µm, was mounted on a conventional glass slide. Subsequently, both sections were deparaffinized using standard protocols. After dewaxing, the section mounted onto the CaF2 slide remained stain-free prior to infrared spectral acquisition, and the section transferred onto glass was further stained with hematoxylin/eosin (H&E). The stained slides were used as templates to identify areas of alveolar edema fluid, whose spectral data were subse-

Figure 1. Illustration of the experimental workflow. For each formalin-fixed, paraffin-embedded lung tissue sample (A), 2 adjacent tissue sections were cut using a microtome (B). The first section, 2.5 µm, was H&E stained (D), while the second section, 7 µm, remained unstained (C). IR spectra were recorded from the unstained sections (E). The H&E stained section imaging (G) was used as a template to identify areas of alveolar edema fluid, whose spectral data were subsequently collected from the corresponding unstained tissue. The IR spectra of alveolar edema fluid were selected by manual pixel selection (F). Black squares were added to the unstained section imaging to indicate the IR spectra selected. The selected spectra (H) were gathered in a single matrix for data preprocessing and analysis.

ACS Paragon Plus Environment

Page 2 of 8

Page 3 of 8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry Data Preprocessing. First, the spectral data set was subjected to a quality test that discarded spectra from areas with

little or no tissue. The test was based on the intensity of

Figure 2. FTIR average second-derivative spectra for pulmonary edema fluids of SCD, NC and LMI in the 1800–900 cm-1 region (A). The box-and-whisker plot (B) showing the sum of the absolute absorbance values of the absorption bands at 1657 and 1547 cm-1 for the three groups. An asterisk means that there is a significant difference (p < 0.001) by Kruskal-Wallis analysis and a multiple-comparison test.

maximum absorbance within the 1724-1475 cm-1 region, with spectra with an intensity less than 0.1 or greater than 1.2 discarded. Next, to enhance the visual identification of the spectral bands that overlay one another (especially the 1724-1475 cm-1 region, mainly corresponding to protein) and provide better discriminatory power for classification, the qualified spectral data sets were converted to second derivatives using the Savicky-Golay method (15 smoothing points) and then normalized using extended multiplicative signal correction (EMSC). Finally, the dimensionality of the spectral data set was reduced to the biological fingerprint region (1800-900 cm1 ). This fully preprocessed spectral data set, containing 1720, 1120, and 440 row spectra corresponding to the SCD, NC, and LMI groups, respectively, was used for principal component analysis (PCA), genetic algorithm-guided partial least squares discriminant analysis (GA-PLS-DA) analysis, and random forest (RF) analysis. Data Analysis. PCA was employed to differentiate the FTIR-based profiles of edema fluid samples between the SCD, NC, and LMI groups. In the first step, every 10 spectra randomly extracted from one stage spectral data set were averaged to represent one spectrum. A spectrum that had been selected once for computing a mean spectrum was not allowed to be selected again for the computation of another mean spectrum. Consequently, 172, 112, and 44 mean spectra, corresponding to the SCD, NC, and LMI groups, respectively, were calculated and then mean-centred for PCA. PLS-DA is a supervised linear classification method that was implemented in this study to construct discriminant models between SCDinduced pulmonary edema fluid and pulmonary edema fluids from NC and LMI. To extract the discriminating spectral variables and thus improve the classification model performance, genetic algorithm (GA) analysis was integrated into PLS-DA. Another popular classification method called RF was also performed to construct a classification model for

post-mortem diagnosis of SCD. The principle of RF analysis is an ensemble of decision trees. While each decision tree by itself is relatively weak, the majority vote over the individual decision trees yields a highly accurate classification. Compared with the PLS-DA algorithm, RFs can handle outliers more easily in the input space. The complete preprocessed row spectral data set was split into a training set (70% of the row spectra) and a test set (30% of the row spectra): the training set was used to train the GA-PLS-DA and RF classification models, and the test set was used to evaluate the performance of the models. The preprocessing of the row spectral data set was performed using MATLAB 2014a (MathWorks Inc., Natick, MA, USA) equipped with the PLS toolbox 8.1.1 (Eigenvector Research Inc., Manson, WA, USA). PCA was carried out using the Statistics Toolbox built into MATLAB. The GAPLS-DA was performed using classification toolbox 4.2 (Milano Chemometrics and QSAR Research Group, University of Milano Bicocca, Italy) in combination with the PLS-genetic algorithm toolbox (University of Genoa, Italy). RF analysis was conducted using the “RandomForest” package implemented in R.

RESULTS AND DISCUSSION A comparison of the average second-derivative spectra for pulmonary edema fluid of the SCD, NC, and LMI groups is shown in Figure 2A. At first glance, the average spectrum of the LMI group significantly differs from those of the other two groups in the entire spectral region, but spectral changes are not easily identified between SCD and NC. Nevertheless, rough spectral changes between these three groups could still be observed in the amide I and II regions, which are both the most prominent and sensitive vibrational bands of the protein backbone in the mid-infrared spectral region. In fact, except

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

for water, the dominant component of pulmonary edema fluid that mainly originates from plasma is proteins, such as albumin and globulin. Hence, a deeper investigation of these two

Page 4 of 8

spectral regions is significant. The box-and-whisker plot in Figure 2B shows the sum of the absolute absorbance values of the amide I

Figure 3. The results of PCA applied to 3 groups of mean second-derivative spectra originating from SCD, NC and LMI tissues, showing a PC-1 versus PC-3 score plot (A), a PC-1 correlation loading plot (B), and a PC-3 correlation loading plot (C).

and II peaks, illustrating the protein level. As can be observed, there is a significant decrease in the absorbance from the SCD group to the NC and then LMI groups, reflecting different protein concentrations in the pulmonary edema fluid associated with these three causes of death, thus implying different pathological biochemical alternations in pulmonary edema fluid in response to pre-mortem diseases. PCA analysis. Next, we employed PCA, whose score and loading plots can reveal subtle spectral variances, to further investigate the biochemical alternations in pulmonary edema fluid associated with these three causes of death. Figure 3A shows that the mean second-derivative spectra (the mean spectra were calculated according to the data analysis section) from the SCD, NC, and LMI groups are clustered together in the PC-1 versus PC-3 score plot. As can be observed, LMI edema fluid spectra are totally separated from SCD and NC edema fluid spectra along PC-1, which explains most of the variability (57.9%). Figure 3B shows the results of PC-1 loadings. Spectra corresponding to the LMI edema fluid are characterized by a high intensity of bands at 1635 cm−1 (β-sheet secondary structure of amide I)34 and 1514 cm-1 (side chain residues of tyrosine)35, whereas spectra corresponding to the SCD and NC edema fluid are characterized by a high intensity of bands at 1657 cm−1 (α-helical structure of amide I)36. Except for a little overlapping of scattered points, a general discrimination between the SCD and NC groups is still observed for PC-3. Figure 3C shows the corresponding loading plot, which indicates that the largest changes in the composition of edema fluids between SCD and NC are also associated with proteins. For PC-3, the discriminating positively correlated loadings are found for SCD at 1641 (random coil structure of amide I)37 and 1552 cm−1 (amide II)38, whereas negatively correlated loadings characteristic for NC are found at 1624 (βsheet structure of amide I)39 and 1535 cm−1 (amide II)38. Taken together, from the results of the PCA score and loading plots, it can be concluded that there are different protein structure profiles in pulmonary edema fluid between the SCD, NC, and LMI groups. This result, integrated with the above results, further indicates specific biochemical responses for different diseases or pre-mortem pathological states.

GA-PLS-DA analysis. Considering the rough distinction between SCD and NC edema fluid spectra by the PCA result, our next step was to perform GA-PLS-DA. Unlike the advanced machine learning analyses, such as support vector machine (SVM) and artificial neural network (ANN), whose modeling is more like black box40, the GA-PLS-DA could give two important outcomes, the regression coefficient and the GA selected variable subset, which makes it possible to understand the driving force underlying PLS-DA and to gain new insights into the biochemical differences between different biological sample categories41,42. In this study, the parameters of GA analysis used for spectral variable optimization are as follows: population size, 30 chromosomes; mutation rate, 0.01; crossover rate, 0.5; fitness function, root mean square error of cross validation (RMSECV); and number of runs, 100. A total of 96 wavelengths accounting for 20.6% of the full spectral data of 467 wavelengths were ultimately selected by the GA and are displayed in Figure 4A with gray discontinuous shades. The deeper the gray shade color, the more frequently the shade-covered wavelengths were selected in the GA performance, thus the more specific the wavelengths were for discrimination analysis. As the result showed, the selected spectral variables mainly existed in the 1700-1300 cm–1 region, which is associated with protein conformations and amino acids43. Next, 10-fold cross-validation (CV) was performed to select the optimal number of latent variables (LVs) for construction of PLS-DA models. The plot of the error rate in the CV as a function of the number of LVs is displayed in Figure 4B. Ultimately, the calibrated PLS-DA model (see Figure 4C) was built with six LVs, as this is the lowest possible number of LVs with the lowest CV classification error rate. The good classification results with respect to SCD ability are 0.91 and 0.85 (sensitivity and specificity, respectively), indicative of a stable and reliable PLS-DA classification model. A regression coefficient is an outcome of the PLS-DA model, and the wavelengths with high absolute values of coefficients are important for class discrimination42. Figure 4D shows the regression coefficient plot for our binary PLS-DA model. It is observed that numerous wavelengths with positive and negative coefficients exist in the amide I and II regions.

ACS Paragon Plus Environment

Page 5 of 8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry This result, corresponding to the PCA loading plot, further demonstrates different protein secondary structures between the SCD and NC groups. However, the different discriminating amide bands of the PLS-DA using individual spectra in comparison with those of PCA using mean spectra imply that there are significant inter-individual differences. The coefficient plot also reveals that the discriminating wavelengths (1489, 1389, 1367, and 1309 cm-1) appear in the spectral re-

gion 1500-1300 cm-1, which contains various absorption peaks of plasma biomolecules such as fibrinogen, haptoglobin, IgG1, IgA, and IgM44. It is reasonable to expect that the pulmonary edema fluids of SCD and NC, whose protein components are similar to plasma, would have these biomolecule components and quantitative differences of these molecules in responses to pre-mortem pathological states,

Figure 4. Selected regions of the infrared spectrum (gray shades) selected by the genetic algorithm (A). Classification error rate in crossvalidation as a function of the number of latent variables included in the binary PLS-DA classification model (B). Calibrated prediction results for the sudden cardiac death class in the binary PLS-DA classification model (C). Regression coefficients for the PLS-DA model (D).

which manifest as the above spectral differences. Another feature contributing to the segregation of spectra from the SCD and NC pulmonary edema fluids was two positively correlated coefficients at 1038 and 1020 cm-1 (originating from glucose)45, indicative of a lower concentration of glucose in the pulmonary edema fluid from the SCD group than in that from the NC group. Additionally, the most pronounced feature is a shift of the O–P–O asymmetric stretching band from 1223 cm−1 for the NC group to 1232 cm−1 for the SCD group. Such a shift was associated with the conformation change for the BDNA to A-DNA transition46. To summarize, although absorption alternations of amide and amino acid regions proved to be different between SCD- and NC-group pulmonary edema fluid spectra, it is impossible to attribute this absorption alternation

to particular changes within individual proteins in the pulmonary edema fluid, as IR spectroscopy is a technique that provides averaged information concerning all biochemical components present within the edema fluid. Nevertheless, our approach has demonstrated distinguishable spectral fingerprints between SCD and NC pulmonary edema fluids, which offers a potential for post-mortem diagnostic purposes. Construction of classification models. For the forensic application of the post-mortem diagnosis or exclusion of SCD, the precise information on the biochemical determinants of the differentiation of pulmonary edema fluids arising from different causes of death is not required. Instead, what we are more concerned with is the accuracy of the classification model. In this study, we sought to apply a new machine learning method,

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

a random forest algorithm, which is an accurate and robust against over-fitting approach47, to investigate our spectral data set. For this end, the training data sets of the NC and LMI groups were first gathered into a single data set representing the non-SCD group. Next, the RF classifier was constructed using the parameter set as follows: ntree, 500; mtry, sqrt of 467; and nodesize, 5. It took less than one minute to train the random forest on the 2296 spectra. The classifier was tested on the validation data set, which consisted of 516 SCD and 468 non-SCD spectra. Additionally, the GA-PLS-DA model for binary classification of SCD and non-SCD, with 225 selected variables and 8 LVs, was also constructed. The model was validated using the same test data set as the RF model. The classification parameters of these two models for predicting SCD spectra were collected and are tabulated in Table 1. Looking at the results, the SCD classification performance with the RF model is much better than that with the GA-PLSDA model. Table 1. Classification parameters obtained on the test set Classification model

Non-error rate

Specificity

Sensitivity

RF GA-PLS-DA

96.5% 87.1%

95.5% 83.5%

97.3% 90.3%

To further assess the classification accuracy of the GA-PLSDA and RF models, receiver operating characteristic (ROC) analyses were conducted, and the areas under the ROC curves (AUCs) that assess the performance of classification models were also calculated48 (see Figure 5). The AUC has values in the interval [0, 1], where a value of 0.5 means a random classification and 1 means perfect performance. For our two models developed to differentiate between SCD and non-SCD edema fluid, the AUC values of the ROC curves were 0.99 and 0.94, respectively, which confirmed the better classification capability of the RF model.

Page 6 of 8

While the high classification accuracy of the RF model is evident, it is important to consider that a classifier trained and tested on the spectra of pulmonary edema fluid from the same cases could provide over-optimistic results. The question of whether spectra originating from new cases not available during training would result in the same high classification accuracy needs to be addressed. For this purpose, we collected 60 IR images of the lung tissues from 16 different cases (SCD, cases Nos. 1-7, NC, cases Nos. 8-11, LMI, case No. 12-16). The spectra were collected and preprocessed in the same way as the samples used for training classification as described in the “material and methods” section. As a consequence, 16 row spectral data sets were obtained (see Figure 6A), where each spectral data set represents one case, and then, these data sets were loaded into the RF and GA-PLS-DA models. The prediction results are illustrated in Figure 6B and C. Looking at the results, the RF classification model, showing a better prediction performance than GA-PLS-DA model, correctly predicted the overall classification of the SCD/non-SCD pulmonary edema fluid for all cases when considering a threshold of 50% for the global assignment. Compared with the prediction performance based on the test set (see Table 1), the values of classification parameters obtained on external validation set (see Table 2) are a little lower, indicative of slight overfitting of the classification models. However, considering the nature of RF algorithm that is robust against overfitting47 and the strict selection of LVs when performing GA-PLS-DA, we tend to attribute this overfitting problem to the small training sample size. Nevertheless, compared with the GA-PLS-DA algorithm, RF is easily scalable, which means that the same (or very similar) parameters can be used in the future with larger datasets49. The further RF classifier development by expanding the number of cases is in process in our laboratory. Table 2. Classification parameters obtained on the external validation set Classification model

Non-error rate

Specificity

Sensitivity

RF GA-PLS-DA

91.3% 82.1%

93.6% 78.6%

88.5% 86.3%

CONCLUSIONS

Figure 5. ROC curves for the RF and GA-PLS-DA models.

In this preliminary study, FTIR microspectroscopy combined with chemometrics was shown to be a potential tool for in situ biochemical investigation of alveolar edema fluid in lung tissue sections associated with various causes of death, including SCD, NC, and LMI. FTIR-based assessment of the sum of amide I and II absorptions demonstrated different total protein levels between these three groups, and pulmonary edema fluid originating from SCD samples was richest in protein components. The results of unsupervised PCA and supervised GA-PLS-DA further indicated that the global spectral band patterns of these three groups were different. A SCDnon-SCD binary RF classifier was ultimately constructed and validated. The results of a validation test showed that the model reached very good sensitivity and specificity, with scores of 97.3% and 95.5%, respectively. Considering the individual differences, another validation test was carried out using pulmonary edema fluid spectra collected from sixteen

ACS Paragon Plus Environment

Page 7 of 8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry different cases. As mentioned, the validation test also showed good results. To the best of our knowledge, this is the first state-of-the-art study to show a potential forensic application of infrared spectroscopy for post-mortem diagnosis of SCD. However, before this new method can be employed for forensic practices, much more work needs to be done. For example,

the sample sizes for the SCD and control groups must be expanded. Additional causes of death should be taken into consideration in the control group, such as fatal anaphylactic shock, drowning, fatal hyperthermia, fatal hypothermia, and intoxication-related death. Additionally,

Figure 6. Bar charts showing the number of collected spectra for each case (A) and the percentages of correct prediction for the cause of death using the RF (B) and GA-PLS-DA (C) classification models.

autolysis is another confounding factor. The question of whether the contaminative effects of decomposition/putrefactive components could affect the outcomes of the measurement of FTIR spectroscopy needs further investigation. All of these goals are currently being pursued in our laboratory.

Notes Hancheng Lin and Yiwen Luo contributed equally to this work. The authors declare no competing financial interest.

ACKNOWLEDGMENT AUTHOR INFORMATION Corresponding Author * Zhenyuan Wang Email: [email protected], Tel: +86 029-82655472. * Ping Huang Email: [email protected], Tel: +86 020-52367986.

This project was supported by grants from the National Key R&D Program of China (2016YFC0800702), the National Natural Science Foundation of China (81730056, 81722027, 81471819, 81601645 and 81671869), and the Science and Technology Committee of Shanghai Municipality (17DZ2273200 and 16DZ2290900).

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

REFERENCES (1) Kandala, J.; Oommen, C.; Kern, K. B. Br. Med. Bull. 2017, 122, 5-15. (2) Wellens, H. J. J.; Schwartz, P. J.; Lindemans, F. W.; Buxton, A. E.; Goldberger, J. J.; Hohnloser, S. H.; Huikuri, H. V.; Kääb, S.; Rovere, M. T. L.; Malik, M. Eur. Heart J. 2014, 35, 1642-1651. (3) Grandmaison, G. L. D. L. Forensic Sci. Int. 2006, 156, 138-144. (4) Basso, C.; Calabrese, F.; Corrado, D.; Thiene, G. Cardiovasc. Res. 2001, 50, 290-300. (5) Basso, C.; Burke, M.; Fornes, P.; Gallagher, P. J.; Gouveia, R. H. D.; Sheppard, M.; Thiene, G.; Wal, A. V. D. Virchows Arch. 2010, 102, 391404. (6) Michaud, K.; Grabherr, S.; Jackowski, C.; Bollmann, M. D.; Doenz, F.; Mangin, P. Int. J. Legal Med. 2014, 128, 127-137. (7) Michaud, K.; Grabherr, S.; Faouzi, M.; Grimm, J.; Doenz, F.; Mangin, P. Int. J. Legal Med. 2015, 129, 1067-1077. (8) Polacco, M.; Sedati, P.; Arena, V.; Pascali, V. L.; Zobel, B. B.; Oliva, A.; Rossi, R. Int. J. Legal Med. 2015, 129, 517-524. (9) Michiue, T.; Ishikawa, T.; Oritani, S.; Kamikodai, Y.; Tsuda, K.; Okazaki, S.; Maeda, H. Forensic Sci. Int. 2013, 232, 199-205. (10) Chen, J. H.; Michiue, T.; Ishikawa, T.; Maeda, H. Forensic Sci. Int. 2012, 223, 342-348. (11) Palmiere, C.; Mangin, P. Int. J. Legal Med. 2012, 126, 199-215. (12) Sabatasso, S.; Mangin, P.; Fracasso, T.; Moretti, M.; Docquier, M.; Djonov, V. Int. J. Legal Med. 2016, 130, 1265-1280. (13) Michaud, K.; Mangin, P.; Elger, B. S. Int. J. Legal Med. 2011, 125, 359-366. (14) Campuzano, O.; Sanchez-Molero, O.; Allegue, C.; Coll, M.; Mademont-Soler, I.; Selga, E.; Ferrer-Costa, C.; Mates, J.; Iglesias, A.; Sarquella-Brugada, G. Forensic Sci. Int. 2014, 245, 30-37. (15) Campuzano, O.; Allegue, C.; Partemi, S.; Iglesias, A.; Oliva, A.; Brugada, R. Int. J. Legal Med. 2014, 128, 599-606. (16) Xue, Y.; Zhao, R.; Du, S. H.; Zhao, D.; Li, D. R.; Xu, J. T.; Xie, X. L.; Wang, Q. Int. J. Legal Med. 2016, 130, 915-922. (17) Eva-Lena, S.; Maria, W. I.; Kristina, C.; Jenni, J.; Björn-Anders, J.; Stellan, M.; Anna, N.; Peter, K.; Aase, W. Int. J. Legal Med. 2016, 130, 59-66. (18) Hashim, S. W.; Kay, H. R.; Hammond, G. L.; Kopf, G. S.; Geha, A. S. The American journal of surgery 1984, 147, 560-564. (19) Fein, A.; Grossman, R. F.; Jones, J. G.; Overland, E.; Pitts, L.; Murray, J. F.; Staub, N. C. American Journal of Medicine 1979, 67, 32-38. (20) Ware, L. B.; Fremont, R. D.; Bastarache, J. A.; Calfee, C. S.; Matthay, M. A. Eur. Respir. J. 2010, 35, 331–337. (21) Wang, Q.; Ishikawa, T.; Michiue, T.; Zhu, B. L.; Guan, D. W.; Maeda, H. Int. J. Legal Med. 2012, 126, 875-882. (22) Wang, Q.; Ishikawa, T.; Michiue, T.; Zhu, B. L.; Guan, D. W.; Maeda, H. Forensic Sci. Int. 2013, 228, 137-141. (23) Du, Y.; Jin, H. N.; Zhao, R.; Zhao, D.; Xue, Y.; Zhu, B. L.; Guan, D. W.; Xie, X. L.; Wang, Q. J. Forensic Sci. 2016, 61, 1531-1537. (24) Baker, M. J.; Hussain, S. R.; Lovergne, L.; Untereiner, V.; Hughes, C.; Lukaszewski, R. A.; Thiéfin, G.; Sockalingum, G. D. Chem. Soc. Rev. 2015, 45, 1803-1818. (25) Kendall, C.; Isabelle, M.; Bazanthegemark, F.; Hutchings, J.; Orr, L.; Babrah, J.; Baker, R.; Stone, N. Analyst 2009, 134, 1029-1045. (26) Erukhimovitch, V.; Talyshinsky, M.; Souprun, Y.; Huleihel, M. Vib. Spectrosc. 2006, 40, 40-46. (27) Ahmed, S. S. S. J.; Santosh, W.; Kumar, S.; Christlet, T. H. T. Vib. Spectrosc. 2010, 53, 181-188. (28) Gajjar, K.; Trevisan, J.; Owens, G.; Keating, P. J.; Wood, N. J.; Stringfellow, H. F.; Martin-Hirsch, P. L.; Martin, F. L. Analyst 2013, 138, 3917-3926. (29) Staniszewska-Slezak, E.; Fedorowicz, A.; Kramkowski, K.; Leszczynska, A.; Chlopicki, S.; Baranska, M.; Malek, K. Analyst 2015, 140, 2273-2279. (30) Zelig, U.; Barlev, E.; Bar, O.; Gross, I.; Flomen, F.; Mordechai, S.; Kapelushnik, J.; Nathan, I.; Kashtan, H.; Wasserberg, N. BMC Cancer 2015, 15, 408. (31) Staniszewska-Slezak, E.; Mateuszuk, L.; Chlopicki, S.; Baranska, M.; Malek, K. J. Biophotonics 2016, 9, 1098-1108. (32) Staniszewska-Slezak, E.; Wiercigroch, E.; Fedorowicz, A.; Buczek, E.; Mateuszuk, L.; Baranska, M.; Chlopicki, S.; Malek, K. J. Biophotonics 2017, DOI: 10.1002/jbio.201700044. (33) Baumann, A.; Audibert, G.; Mcdonnell, J.; Mertes, P. M. Acta Anaes-

Page 8 of 8

thesiol. Scand. 2007, 51. 447-455. (34) Baltacıoğlu, H.; Bayındırlı, A.; Severcan, M.; Severcan, F. Food Chem. 2015, 187, 263-269. (35) Zhang, J.; Li, B.; Wang, Q.; Li, C.; Zhang, Y.; Lin, H.; Wang, Z. Spectrochim. Acta. A 2017, 173, 733-739. (36) Ami, D.; Natalello, A.; Taylor, G.; Tonon, G.; Maria, D. S. Biochim. Biophys. Acta 2006, 1764, 793-799. (37) Paluszkiewicz, C.; Piergies, N.; Sozańska, A.; Chaniecki, P.; Rękas, M.; Miszczyk, J.; Gajda, M.; Kwiatek, W. M. Spectrochim. Acta. A 2018, 188, 332-337. (38) Baker, M. J.; Trevisan, J.; Bassan, P.; Bhargava, R.; Butler, H. J.; Dorling, K. M.; Fielden, P. R.; Fogarty, S. W.; Fullwood, N. J.; Heys, K. A. Nat. Protoc. 2014, 9, 1771-1791. (39) Kumar, S. T.; Leppert, J.; Bellstedt, P.; Wiedemann, C.; Fändrich, M.; Görlach, M. J. Mol. Biol. 2016, 428, 268-273. (40) Kallenbach‐Thieltges, A.; Großerüschkamp, F.; Mosig, A.; Diem, M.; Tannapfel, A.; Gerwert, K. J. Biophotonics 2013, 6, 88-100. (41) Duraipandian, S.; Zheng, W.; Ng, J.; Low, J. J. H.; Ilancheran, A.; Huang, Z. Analyst 2011, 136, 4328-4336. (42) Ballabio, D.; Consonni, V. Anal Methods 2013, 5. 3790-3798. (43) Wang, Q.; He, H.; Li, B.; Lin, H.; Zhang, Y.; Zhang, J.; Wang, Z. PloS one 2017, 12, e0182161. (44) Petibois, C.; Cazorla, G.; Cassaigne, A.; Déléris, G. Clin. Chem. 2001, 47, 730-738. (45) Kim, Y. J.; Hahn, S.; Yoon, G. Appl. Opt. 2003, 42, 745-749. (46) Hackl, E. V.; Kornilova, S. V.; Blagoi, Y. P. Int. J. Biol. Macromol. 2005, 35, 175-191. (47) Breiman, L. Machine learning 2001, 45, 5-32. (48) Zweig, M. H.; Campbell, G. Clin. Chem. 1993, 39, 561-577. (49) Smith, B. R.; Ashton, K. M.; Brodbelt, A.; Dawson, T.; Jenkinson, M. D.; Hunt, N. T.; Palmer, D. S.; Baker, M. J. Analyst 2016 141, 3668-3678.

ACS Paragon Plus Environment

For TOC only