High-Resolution 1H NMR Spectroscopy Discriminates Amniotic Fluid

Sep 8, 2015 - High-Resolution 1H NMR Spectroscopy Discriminates Amniotic Fluid of ...... Suykens , J. A. K. ; Van Gestel , T. ; De Brabanter , J. ; De...
0 downloads 0 Views 1MB Size
Subscriber access provided by UNIV OF NEBRASKA - LINCOLN

Article 1

High resolution H NMR spectroscopy discriminates amniotic fluid of fetuses with congenital diaphragmatic hernia from healthy controls. Anca Croitor-Sava, Veronika Beck, Inga Sandaite, Sabine Van Huffel, Tom Dresselaers, Filip Claus, Uwe Himmelreich, and Jan Deprest J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.5b00131 • Publication Date (Web): 08 Sep 2015 Downloaded from http://pubs.acs.org on September 14, 2015

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

High resolution 1H NMR spectroscopy discriminates amniotic fluid of fetuses with congenital diaphragmatic hernia from healthy controls Anca Croitor-Sava1,2,#, Veronika Beck3,4,#, Inga Sandaite3,5, Sabine Van Huffel1,2, Tom Dresselaers6,7, Filip Claus3,5, Uwe Himmelreich6,7,* and Jan Deprest3,4 #

Both authors contributed equally.

1

Department of Electrical Engineering (ESAT) - STADIUS, University of Leuven,

Leuven, Belgium 2

iMinds, Medical Information Technologies Department, Leuven, Belgium

3

Department of Development and Regeneration, Faculty of Medicine, University of

Leuven, Herestraat 49, 3000 Leuven, Belgium 4

Department of Obstetrics and Gynecology, University Hospital Gasthuisberg, Herestraat

49, 3000 Leuven, Belgium 5

Division of Medical Imaging, University Hospital Gasthuisberg, Leuven, Belgium

6

Department of Imaging and Pathology, Biomedical MRI Unit, Herestraat 49, 3000

Leuven, Belgium 7

MoSAIC, University of Leuven, Herestraat 49, 3000 Leuven, Belgium

*Corresponding author Uwe Himmelreich, PhD, Department of Imaging and Pathology, Biomedical MRI Unit, University of Leuven, Herestraat 49, bus 505, 3000 Leuven, Belgium, email: [email protected], phone: +32 16 330925, fax: +32 16 330901

Conflict of Interest Statement The authors declare no competing financial interest.

1 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Abstract Lung hypoplasia in congenital diaphragmatic hernia (CDH) is a life-threatening birth defect. Severe cases can be offered tracheal occlusion to boost prenatal lung development, though defining those to benefit remains challenging. Metabonomics of 1H NMR spectra collected from amniotic fluid (AF) can identify general changes in diseased versus healthy fetuses. AF embodies lung secretions, hence, might contain pulmonary next to general markers of disease in CDH fetuses. AF from 81 healthy and 22 CDH fetuses was collected. NMR spectroscopy was performed at 400 MHz to compare AF from fetuses with CDH against controls. Several advanced feature extraction methods based on statistical tests which explore spectral variability, similarity and dissimilarity were applied and compared. This resulted in the identification of 30 spectral regions, which accounted for 80% variability between CDH and controls. Combination with automated classification discriminates AF from CDH versus healthy fetuses with up to 92 % accuracy. Within the identified spectral regions isoleucine, leucine, valine, pyruvate, GABA, glutamate, glutamine, citrate, creatine, creatinine, taurine and glucose were the most concentrated metabolites. As the metabolite pattern of AF changes with fetal development, we have excluded metabolites with a high age-related variability and repeated the analysis with twelve spectral regions, which has resulted in similar classification accuracy. From this analysis, it was possible to distinguish between AF from CDH fetuses versus healthy controls independent of gestational age.

Keywords: Congenital diaphragmatic hernia, NMR spectroscopy, feature extraction, classification, amniotic fluid, lung, fetal development

2 ACS Paragon Plus Environment

Page 2 of 29

Page 3 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Introduction Congenital diaphragmatic hernia (CDH) occurs in 1-2 per 5,000 newborns1. While the name refers to a defect in the diaphragm, the associated abnormal lung development is clinically more relevant. Pulmonary underdevelopment is characterized by size and weight but also functional aspects. The number of airways as well as pulmonary arteries is decreased and their walls are thickened. In addition, altered elastin and collagen contents impact lung compliance. At birth, pulmonary insufficiency and hypertension complicate postnatal adaptation and are fatal in up to 30% of babies in whom the condition is isolated2. Severe pulmonary hypoplasia can be diagnosed prior to birth defining a subset of fetuses that may benefit from prenatal interventions to improve lung development. One strategy is tracheal occlusion (TO), which prevents the egress of pulmonary fluid, thereby increasing tissue stretch and accelerating lung growth. Clinically, TO is currently offered from 26 weeks of gestation onwards within experimental trials3. Although minimally invasive, it is complicated by preterm prelabor rupture of membranes in 50% of cases and in 31% delivery occurs prior to 34 weeks3. Current algorithms are mainly based on lung size as expressed by lung-to-body-weight ratio (LBWR). Fetal lung function is more challenging to assess. One example is the maternal hyperoxygenation test, which examines the responsiveness of fetal pulmonary arteries to oxygen and is an independent predictor of fetal outcome. Furthermore, there is a high and so far not well understood variability in lung response to TO. At this moment, non-responders cannot be identified in advance in order to receive alternative treatment. Fetal magnetic resonance imaging (MRI) is routinely used in the management of fetuses with CDH to mainly retrieve anatomical (size) information. Although NMR spectroscopy has successfully been used to identify metabolites in fetal organs and amniotic fluid (AF) in vivo and ex vivo4, the first remains challenging due to movements of the fetus. Ex vivo NMR spectroscopy of tissue and body fluids is a reliable tool for the metabolic characterization of biofluids5-8. Inter- and intra-observer reproducibility is excellent and normative curves for fetal development were suggested for NMR spectra acquired at the 2nd and 3rd trimester9-10. Detailed information on changes in metabolic patterns related to fetal lung11-13 and kidney maturation14 have been reported. There has been a limited 3 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

number of NMR studies on complicated pregnancies, e. g. fetuses with spina bifida14-15, Down syndrome14, cystic fibrosis16, malformations, genetic abnormalities and diabetes1718

. Interestingly, a general change in metabolic patterns of AF was described for various

malformations compared to healthy controls. This includes lower glucose levels in combination with higher lactate scores, pointing to rather anaerobic pathways of energy production in fetuses with malformations. Diet, lifestyle or fetal gender show no or limited influence on NMR spectra17. Although NMR spectroscopy has been subject of several original publications as well as reviews on lung anatomy and maturation19 20, no NMR study has focused on the metabolic profile of fetuses with CDH. Classic biochemical analysis of maternal serum samples from CDH cases was associated with decreased levels of e.g. vitamin A and retinol-binding protein21 both of which seem to be crucial in the pathogenesis of the disease and are present in AF. Hence, metabonomics of AF might provide additional information on fetal pulmonary characteristics and possibly identify new diagnostic and/ or prognostic markers for CDH. Initial studies on human AF mainly report on few preselected metabolites10-11, 14. This approach might leave out potentially valuable information as it has been shown that up to 75 different compounds were identified in NMR spectra of human AF22. Given the lack of systematic prior studies regarding relevant metabolites for CDH pathogenesis and to minimize the risk to ignore potential metabolic markers, we have analyzed NMR spectra based on statistical properties that automatically identify regions altered in CDH when compared to controls. Several automatic feature extraction methods that explore statistical properties such as the spectral variability, similarity or dissimilarity within groups were applied. We have compared two projection methods, principal component analysis (PCA)23 and principal coordinate analyses (PCO)24 that transform the data into a feature space that best reflects the above properties. Three ranking models were used, T-test25, Kruskal-Wallis test26 and Fisher discriminant27 that identify those spectral regions most relevant for distinguishing between CDH and controls. The feature extraction results can be validated by using both a kernel and a linear classifier. The selected spectral regions can be further exploited to assign metabolites that are relevant for CDH. With the proposed data analysis, no prior

4 ACS Paragon Plus Environment

Page 4 of 29

Page 5 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

knowledge regarding the metabolites present in the data is needed and only useful information is kept, while noise or artifacts are filtered out.

We have aimed to examine a) if the reported general metabolic changes of different malformations could also be found in CDH fetuses, and b) if there were any specific markers present in AF of fetuses with CDH compared to healthy controls.

5 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 29

Material and Methods Patients and sample preparation AF samples from 81 healthy and 22 CDH fetuses were collected at the Department of Obstetrics and Gynecology, University Hospital Gasthuisberg, Leuven, Belgium, from November 2009 to August 2012. Next to TO procedures, samples were acquired during amniocentesis and cesarean section. Along with the date and type of the intervention, the gestational age and as far as available further characteristics as e. g. rhesus factor, bodymass index (BMI), nicotine use, maternal diabetes and the obstetrical history including method of conception were recorded. Gestational age ranged from 15-39 weeks in the control and 21-35 weeks in the CDH group, respectively (Table 1). Only AF samples with normal fetal karyotype were used for MRS analysis and no known infectious material was collected. Samples were frozen immediately in liquid nitrogen and stored at -80°C. The study was approved by the local ethics committee. Informed consent was obtained for all cases.

NMR spectroscopy 1

H NMR spectra of AF were acquired at 25°C without spinning using an Avance 400

MHz NMR spectrometer equipped with a 5mm [1H,

13

C] inverse-detection dual-

frequency probe (Bruker Biospin, Rheinstetten, Germany). Samples were thawed, 0.1ml D2O (containing sodium 3-(trimethylsilyl) propane sulfonate as a chemical shift reference) was added to 0.4ml amniotic fluid and transferred to a 5mm NMR tube. Onedimensional 1H NMR spectra were acquired with a spectral width of 12ppm and using 16k data points. Free induction decays were averaged over 8k accumulations. A relaxation delay of 2s was allowed. Residual water was suppressed using a 1D NOESY pre-saturation sequence. An exponential function was applied prior to Fourier transformation, resulting in a line broadening of 0.1Hz. NMR spectra were phase and baseline corrected using the Topspin software (Bruker Biospin). Spectral quality was assessed after each measurement using the line-width at half height of the choline resonance at 3.21ppm (25). In case of poor spectral quality, the measurement was repeated. This resulted in

6 ACS Paragon Plus Environment

Page 7 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

112 NMR spectra. One CDH case was retrospectively excluded for generalized hydrops as were ten control samples, which depicted an abnormal karyotype.

Signal assignment Primary NMR signal assignment was based on previously published data8,

22, 28-29

.

Assignment was confirmed by either spiking an AF sample with the respective pure compound and/ or by the acquisition of 2D NMR spectra. 2D homo- and heteronuclear correlation spectra were acquired similarly to previously described approaches7, 29. Five selected AF samples for each group (control and CDH) were used for 2D NMR spectroscopy. 1D NMR spectra were acquired before and after the 2D experiments to confirm integrity of metabolite composition. Standard [1H, 1H] COSY and [1H, 1H] TOCSY experiments were acquired with the following parameters: spectral width in t2 12ppm, t2 time domain 2K, 128 or 256 increments of 16 or 64 acquisitions each, relaxation delay 1s. TOCSY spectra with mixing times of 40ms were acquired with 256 increments of 2K data points and 64 acquisitions. Standard sensitivity-enhanced gradient inverse-detection HSQC spectra were acquired with the following parameters: optimization for one-bond coupling of 125 and 145Hz, total of 256 increments with 64 acquisitions, 4K complex data points, and

13

C decoupling using GARP-1, relaxation

delay 2s. HMBC spectra were optimized for one-bond coupling of 125Hz and long range coupling constants of 6Hz.

Data processing The data were processed and statistically analyzed using the Matlab platform (MathWorks, version R2010). Fourier-transformed and phase-corrected NMR spectra were imported. The Bioinformatics toolbox msbackadj function was used to perform an additional baseline correction. Then, a frequency alignment with respect to external sodium 3-(trimethylsilyl) propane sulfonate at 0.0ppm was performed. The spectra were normalized to the total spectral area.

Feature reduction and classification

7 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

First a feature reduction step was performed. As it is not certain, which is the most appropriate feature space reduction method for our data, we have combined several feature extraction methods. Two projection techniques (PCA23 and PCO24) and three ranking models (T-test25, Kruskal-Wallis test26 and Fisher discriminant27) were separately applied to extract relevant features from the full spectra. To validate the proposed feature reduction methods, we compared their performance on both a linear and a non-linear classifier using linear discriminant analyses (LDA) and kernel support vector machine (kSVM). The kSVM is based on the LS-SVM toolbox30 and is using an RBF kernel. The optimal regularization parameter and the optimal kernel parameter were estimated using a grid search via a 10-fold cross-validation (function tunelssvm). All classifiers were validated by 10-fold cross-validation, with 10 runs. Thus, all samples were randomly split into training and test sets. The mean performance results are reported.

For PCA, the data are linearly transformed for subtracting a set of uncorrelated variables (features), called principal components (PCs). Since in NMR spectroscopy it is assumed that most of the variance in the original dataset can be explained by a limited number of PCs the method is commonly used as a feature extraction technique31.

Multi-dimensional scaling (MDS), also known as Principal Coordinates Analysis, is a more general projection technique than PCA since it uses any distance matrix24. PCO determines a set of synthetic variables (features) called principal coordinates (PCo(s)) that best represent the pairwise distances between the data. The correlation distance was considered for computing the distance matrix. Since 90% of the variability of the data was accounted for by the first 10 PCs, for comparison with MDS method, the number of PCo(s) to be extracted was specified a priori to 10.

The ranking methods apply a statistical measure to assign a scoring to each spectral feature. Thus, given n NMR spectra of a size (1xM) we obtain for each NMR spectrum a raking vector R of the size (1xM). Each element in the ranking vector represents the score of the corresponding feature spectra. The features are then ranked by the score and selected to be kept or removed based on the scoring. The higher the rank the more 8 ACS Paragon Plus Environment

Page 8 of 29

Page 9 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

significant the feature is in separating between the groups. For a low rank value, the feature does not contribute to the separation. A threshold to keep only the first n features with the highest rank or only the features that have a rank higher than a certain value can be imposed. For this study, we only kept features that accounted for >90% of variability between the data. The peaks found relevant in separating controls from disease were then integrated.

9 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Results Feature extraction and classification The results of the feature extraction are summarized in Table 2 for each of the applied methods and their classification performances. For this analysis, we considered the full spectral range, but excluded the water region from 4.2 to 5.2ppm (Figure 1).

With PCA, the first 10 PCs account for more than 90% variability, while the first PC and the second PC account for 35% and 23% of the total variation in the spectra, respectively. Among the projection techniques, the best classification was achieved with LDA in combination with MDS with an accuracy of 83%. When having the PCs as input, the accuracy of the best classifier was 74% (Table 2).

The results of the ranking methods applied in this study are presented in Figure 2. The highest classification performance for separating the control and CDH group was obtained using the combination of kSVM and T-test, where an accuracy of 92% was reached. The highest sensitivity was reached by using LDA and T-test (98%) and the highest specificity was reached by using LDA and Fisher. Using the 35 most frequently selected spectral regions in combination with LDA or kSVM classification, we have achieved a correct classification in up to 95% of the cases (see Table 2).

Metabolite assignment Among the five feature extraction methods tested in this study, only the results of the ranking methods can be explored in a straightforward fashion in order to identify those spectral regions that are most relevant for distinguishing between the groups. Figure 2 shows the spectral regions that account for more than 90% of variability between CDH and the control group according to the results obtained with T-test, Kruskal-Wallis and Fisher test. The most significant spectral differences were identified in the chemical shift range 0.5-4.2ppm. In the chemical shift range for aromatic, OH, SH and NH protons (5.29.0ppm), only very few regions were ranked. A total of 35 spectral regions that were ranked as significant by at least two out of the three considered methods were further characterized by assigning the regions to potentially relevant metabolites (Table 3). Due 10 ACS Paragon Plus Environment

Page 10 of 29

Page 11 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

to overlapping resonances and the inability to assign contributions from low concentration metabolites, only major contributors have been assigned. Metabolites frequently present in the most significant chemical shift regions include amino acid residues and their precursors (in particular for pathways involving valine, leucine, isoleucine, glutamine/ glutamate and lysine), citrate, succinate, glucose, inositols and choline containing compounds. To assess the statistical relevance of the respective resonances, the respective spectral regions were integrated and the mean relative integrals and standard deviations in CDH versus controls are reported in Table 3. The respective pvalues indicate statistically significant differences for all metabolites (T-test).

The mean gestational age of the CDH group was higher than for the control group (see Table 1). Obtaining age-matched control samples is associated with risks and therefore ethically problematic. To rule out effects by gestational age rather than disease, we compared the variation in the identified spectral regions both with respect to the gestational age and CDH. We observed that alterations in leucine, valine, glutamate, succinate, lysine, glucose and creatinine levels were comparable to the findings by Cohn et. al., 200910 and therefore at least partially related to gestational age and not exclusively to CDH. This concerns namely a decrease in the signal intensities of amino acid residues (see Table 3).

Other metabolites showed opposite trends than expected from gestational age dependent changes like an increase in citrate and a decrease in betaine for AF from the CDH group compared to controls. Resonances that differed between the CDH and control group (over gestational age) can be attributed to acetate, lactate, histidine, α-oxoisovalerate, βhydroxybutyrate and β-aminobutyrate.

Feature reduction and classification independent from gestational age Using 35 spectral regions for classification results in low sample-to-feature ratio that would be prone to overtraining. In order to reduce the numbers of features in a biological meaningful way, we have excluded contributions from regions of the NMR spectra that belong to metabolites that change with gestational age as the mean gestational age of the 11 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

CDH and control group were different (Table 1). This selection was based on literature data10. The trends in metabolic changes between the 2nd and 3rd trimester were confirmed by our data when comparing the low number of control subjects from the 3rd trimester (n=5) with those from the 2nd trimester (n=76, data not shown). The remaining, noneffected 12 spectral regions (see Figure 3) were not reported before to contain metabolites that are affected by gestational age10. The re-classification of our samples resulted in a comparable overall sensitivity (no significant difference between 35 and 12 spectral regions). The performance accuracy and specificity for LDA was also comparable between 35 and 12 spectral regions. However, the sensitivity was higher (94 vs. 83%) and the specificity lower (80 vs. 90%) when using kSVM on 12 vs. 35 spectral (see Table 2).

12 ACS Paragon Plus Environment

Page 12 of 29

Page 13 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Discussion We were able to show that 1H NMR spectroscopy can successfully discriminate AF of fetuses with CDH from AF of healthy controls when combined with a complex feature selection and classification analysis. Previous studies on human AF mainly focused on metabolite assignment and reported the concentrations of preselected metabolites. Thus, Clifton and co-workers reported on changes of choline-containing compounds for assessing lung maturity11. Cohn and coworkers used 21 pre-selected, major metabolites to assess gestational age10. Fetal maturation but also fetomaternal complications were assessed by Bock et al.14. Also successful in assessing lung maturation, the focus of pre-selected metabolites might miss potentially valuable information as it has been shown that at high field NMR spectroscopy close to 75 different compounds can be identified in human AF collected from apparently healthy individuals and those with malformations17-18, 22. Furthermore, although many visible compounds have not been assigned yet, they might be metabolic markers for fetal health conditions and maturity. Given the lack of systematic prior studies regarding relevant metabolites for CDH pathogenesis and outcome, we have analyzed the full NMR spectra and automatically identified those spectral regions alternated in CDH. Hereby, we minimized the risk to ignore potentially useful information. In this study, we propose and compare different methodologies for performing feature extraction: two projection methods (PCA and MDS) and three filtering methods where the feature selection is performed by ranking each spectral feature using the scores measured with T-test, Kruskal-Wallis and Fisher discriminant. The feature extraction methodologies presented here have in common that no prior knowledge regarding the metabolites present in the data is needed and only information useful for classification between CDH and controls is kept, while noise or artifacts are filtered out. Moreover, the projection methods can be directly applied without having any prior knowledge of the (sub)groups present in the data. Although the extracted features reflected most of the variability and similarity or dissimilarities in the data, a drawback is that the biological meaning is lost and the results of these methods cannot be used to directly extract biological relevant information. A disadvantage of the ranking methods is that they require prior information of the (sub)groups present in the data. On the other 13 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

hand, the extracted features preserved their spectral identity and therefore a direct interpretation of the results for identifying chemical shift regions in terms of the disease is possible.

Although our results indicate that both, the projection and the ranking techniques applied in this study, provide reliable classification, our approach only proofs that distinction between AF from CDH and apparently normal fetuses is possible based on NMR spectroscopy. While the combination of T-test and kSVM resulted in the highest overall accuracy, T-test in combination with LDA was most sensitive and Fisher in combination with LDA most specific, most other strategies also resulted in satisfactory classification (Table 2). An optimization of the most suitable strategy was beyond the scope of this study. Given the wide range of feature extraction methods available in the literature (for example32-33), we are aware that other methods might outperform our chosen strategy. Such further optimization is needed if NMR spectroscopy is used for clinical diagnosis.

In order to decrease the risk of overtraining and to increase the probability that our identified biomarkers are truly linked to the disease instead of only gestational age and to avoid possible misinterpretation, we reduced the number of features by excluding those metabolites that have already been described to change over gestation10. This was necessary given the wide range of gestational age and especially the difference in mean gestational age in our two fetal groups. We are aware that this imbalance is a major weakness of our study and results need to be interpreted with caution. However, clinical and ethical requirements make it difficult if not impossible to collect AF from healthy controls at later gestational age. The ultimate sample to feature was still relatively low so that overtraining cannot be excluded34. Interestingly, the reduction from 35 to 12 spectral regions not only reduced the risk of overtraining and utilization of regions that might be rather related to gestational age than CDH, but the performance of the classification was not significantly affected. The utilization of the most frequently selected 35 regions compared to the not age-related 12 regions resulted in only marginal differences in accuracy, sensitivity and specificity when LDA was used. For kSVM, the sensitivity was

14 ACS Paragon Plus Environment

Page 14 of 29

Page 15 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

higher and the specificity was lower for 12 regions compared to 35 regions. Overall, LDA performed better than kSVM in terms of accuracy and sensitivity.

With regard to the metabolites found to be relevant in discriminating CDH from healthy fetuses, acetate, lactate, histidine, α-oxoisovalerate, β-hydroxybutyrate and βaminobutyrate were all relatively increased in CDH compared to normal controls. In addition, we did not find increased citrate concentrations in the late gestational age cases as found by Cohn et al.10, which might indicate a relative decrease in CDH.

The identified potential diagnostic markers seem to be linked to different metabolic pathways and most likely are rather unspecific. However, too little is known about the pathogenesis of CDH to rule out a more distinct role in the course of the disease at this point. We failed to identify vitamin A as a discriminatory metabolite, which might be due to its low amount in AF.

Some of the metabolic changes found in our study were also found as more general metabolomic patterns of disease in other, previous studies. Graca and co-workers described various malformations that showed highly similar regulations of certain metabolic substances17-18. In a study containing 27 fetuses with cardiac, pulmonary, CNS, abdominal or urogenital malformations18, a general pattern for fetuses with birth defects included increased levels of glutamine, glycine, succinate and free lactate and decreased levels of alanine, glutamate, leucine, phenylalanine, tyrosine, valine and glucose in AF. Changes in acetate, histidine, lysine and pyruvate seemed to be less consistent. Subsequent studies described changes in energy (glucose, lactate) and amino acid metabolism based on AF collected from pregnancies with various complications like preterm labor, gestational diabetes or preeclampsia35-36. Despite their limited number of cases the authors tried to link their data according to organ systems, however, did not find trends specific to particular diseases18. Our data resembles some of these metabolic patterns of malformed fetuses and differs in others. In particular the fact that some of those changes coincide with metabolic changes due to gestational age, requires careful interpretation. Increased concentrations of some amino acids and their precursors and 15 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

indications of anaerobe metabolism (lactate) seem to be common for most complications during pregnancy. We did not find decreased glucose with CDH, which might have been confounded by variable maternal blood sugar levels. Potential confounders as the risk of preterm delivery, prediagnosed diabetes or preexistent maternal health problems as asthma compared to malformations had only a minor influence on the composition of AF at the gestational age of 14 to 25 weeks in the studies of Graca et al.18. Given that the gestational age was mostly more advanced in our CDH data set the impact of maternal blood sugar levels might be higher in our fetuses. As gestational diabetes was only recorded if it had been diagnosed at the time of AF collection we cannot clarify its impact on AF glucose levels. We aimed to study changes in tracheal fluid of CDH fetuses before and after TO as well. Such study might help to reduce the entire spectrum of markers to more lung-related ones. Given the nature of tracheal fluid containing pieces of very viscous mucus this substudy did not prove feasible in our hands.

Conclusion Our results show that dimensionality reduction methods in combination with automated classifiers reliably distinguish CDH pregnancies from healthy controls when analyzing 1

H NMR spectra of human AF. Metabolites that contribute to the distinction between

both groups indicate the involvement of general metabolic pathways like those for amino acid biosynthesis. Identified metabolic markers so far cannot be linked to a distinct role in the course of the disease but support the hypothesis of a general modification of metabolic pathways in fetal disease.

16 ACS Paragon Plus Environment

Page 16 of 29

Page 17 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure Legends: Figure 1: Representative example 1H NMR spectra of amniotic fluid (chemical shift region 0.5-4.5ppm) from the control group, 3rd trimester (a), control group, 2nd trimester (b), CDH, 3rd trimester (c) and CDH 2nd trimester (d). Main metabolites of the respective regions are indicated above (AA-α resonances of amino acids, ac-acetate, αov-αoxoisovalerate,

arg-arginine,

bet-betaine,

βhb-β-hydroxybutyrate,

chol-choline/

phosphocholine/ glycerophosphocholine, cit-citrate, cre-creatinine, glc-glucose, glxglutamine/ glutamate, his-histidine, ile-isoleucine, inos-scyllo & myo inositol, lac-lactate, leu-leucine, lys-lysine, oiv-α-oxoisovalerate, suc-succinate, tau-taurine, val-valine). For a complete resonance assignment of relevant resonances see Table 3. 1H NMR spectra of amniotic fluid samples were acquired at 25°C at 400 MHz using a 5mm [1H,

13

C]

inverse-detection probe. After thawing the frozen samples, 0.1ml D2O (containing sodium 3-(trimethylsilyl) propane sulfonate, at 0 ppm, not shown) was added to 0.4ml amniotic fluid. 1D 1H NMR spectra were acquired with a spectral width of 12ppm. Residual water was suppressed using a 1D NOESY pre-saturation sequence. An exponential function was applied prior to Fourier transformation, resulting in a line broadening of 0.1Hz. Figure 2: The spectral regions ranked with T-test, Kruskal-Wallis and Fisher test as most significant in differentiating CDH from controls are marked for (a) the ppm region [5.29.0] and (b) the ppm region (0.5-4.2ppm), respectively. Panels (c) and (d) are zooming in the regions 0.9-1.2ppm and 3.0-3.3ppm, for better visualization. Blue spectra are averages from the control group. Red spectra are averages from the CDH group. Figure 3: The mean and standard deviation for the 12 spectral regions which were found most relevant in discriminating CDH from controls. Labels under the spectral regions indicate to the most prominent metabolites that were assigned based on 2D NMR spectra and chemical shift reference data.

17 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 29

Table 1: Patient information. CDH

Control Total

Patients

22

81

103

Number of NMR spectra

29

85

114

1

10

11

Rejected NMR spectra due to low quality

2

11

13

Gestational age range in weeks

21-35

15-39

Mean gestational age in weeks

29.5

20.1

Rejected NMR spectra due to patients fetal karyotype

18 ACS Paragon Plus Environment

Page 19 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Table 2: The mean performance, sensitivity, specificity and corresponding standard deviations of different pairwise classification and feature extraction methods.

Feature extraction

Classifier

Performance

Sensitivity

Specificity

accuracy in % CDH vs control PCA

LDA

73 ± 2.3

88 ± 3.2

67 ± 6.1

PCA

kSVM

74 ± 4.0

78 ± 6.1

72 ± 3.2

MDS

LDA

83 ± 2.8

92 ± 3.5

76 ± 3.8

MDS

kSVM

80 ± 3.3

82 ± 4.1

77 ± 6.2

Kruskal-Wallis

LDA

82 ± 3.0

88 ± 4.3

76 ± 6.0

Kruskal-Wallis

kSVM

79 ± 2.4

76 ± 8,2

82 ± 5.3

T-test

LDA

89 ± 2.2

98 ± 2.1

77 ± 4.2

T-test

kSVM

92 ± 1.5

93 ± 2.5

68 ± 9.6

Fisher

LDA

88 ± 3.9

93 ± 3.6

85 ± 13

Fisher

kSVM

89 ± 1.9

90 ± 4.5

78 ± 9.1

Final classification using most frequently selected regions 35 spectral regions

LDA

95 ± 1.3

98 ± 1.8

89 ± 2.9

35 spectral regions

kSVM

86 ± 1.6

83 ± 2.0

90 ± 0

12 spectral regions

LDA

97 ± 1.2

98 ± 0.6

87 ± 4.8

12 spectral regions

kSVM

90 ± 2.1

94 ± 1.1

80 ± 2.9

19 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 29

Table 3: Chemical shift regions of the NMR spectra from AF samples that have been used to distinguish between the control and CDH group. NMR signals were quantified for both groups. Regions not affected by gestational age have been labeled (*). For all regions, differences were significant (** refers to p