Diagnostic and Prognostic Metabolites Identified ... - ACS Publications

Dec 11, 2015 - study of the KORA (Cooperative Health Research in the Region of Augsburg, ... (KORA S4) between 1999 and 2001 and from a follow-up...
0 downloads 0 Views 324KB Size
Article pubs.acs.org/jpr

Diagnostic and Prognostic Metabolites Identified for Joint Symptoms in the KORA Population Noha A. Yousri,†,‡ Gabi Kastenmüller,§,∥ Wessam G. AlHaq,⊥ Rolf Holle,# Stefan Kaä b̈ ,▽,○ Robert P. Mohney,◆ Christian Gieger,¶ Annette Peters,∥,+ Jerzy Adamski,□,∥,▲ Karsten Suhre,†,§ and Thurayya Arayssi*,⊥ †

Department of Physiology and Biophysics, Weill Cornell Medical College − Qatar, Doha 24144, Qatar Department of Computer and Systems Engineering, Alexandria University, Alexandria 21526, Egypt § Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, German Research Center for Environment Health, 85764 Neuherberg, Germany ∥ German Center for Diabetes Research (DZD), 85764 Neuherberg, Germany ⊥ Department of Medicine, Weill Cornell Medical College − Qatar, Doha 24144, Qatar # Institute of Health Economics and Health Care Management Helmholtz Zentrum, Munich, Germany ▽ Department of Medicine, University Hospital Munich, Campus Grosshadern, 80539 Munich, Germany ○ Innenstadt, Ludwig-Maximilians University & German Center for Cardiovascular Research (DZHK), , Munich Heart Alliance, 80337 Munich, Germany ◆ Metabolon, Inc., Durham, North Carolina 27713, United States ¶ Institute of Genetic Epidemiology, Helmholtz Zentrum München − German Research Center for Environmental Health, Ingolstädter Landstraße 1, 85764 Neuherberg, Germany + Institute of Epidemiology II, Helmholtz Zentrum München, German Research Center for Environmental Health, Ingolstädter Landstraße 1, 85764 Neuherberg, Germany □ Institute of Experimental Genetics, Genome Analysis Center, Helmholtz Zentrum München, German Research Center for Environmental Health, Ingolstädter Landstraße 1, 85764 Neuherberg, Germany ▲ Lehrstuhl für Experimentelle Genetik, Technische Universität München, 85354 Freising-Weihenstephan, Germany ‡

S Supporting Information *

ABSTRACT: This study aims at identifying metabolites that significantly associate with selfreported joint symptoms (diagnostic) and metabolites that can predict the change from a symptom-free status to the development of self-reported joint symptoms after a 7 years period (prognostic). More than 300 metabolites were analyzed for 2246 subjects from the longitudinal study of the KORA (Cooperative Health Research in the Region of Augsburg, Germany), specifically the fourth survey S4 and its 7-year follow-up study F4. Two types of self-reported symptoms, chronic joint inflammation and worn out joints, were used for the analyses. Diagnostic analysis identified dysregulated metabolites in cases with symptoms compared with controls. Prognostic analysis identified metabolites that differentiate subjects in S4 who remained symptom-free after 7 years (F4) from those who developed any combination of symptoms. 48 metabolites were identified as nominally significantly (p < 0.05) associated with the self-reported symptoms in the diagnostic analysis, among which steroids show Bonferroni significance. 45 metabolites were identified as nominally significantly associated with developing symptoms after 7 years, among which hippurate showed Bonferroni significance. We show that metabolic profiles of self-reported joint symptoms are in line with metabolites known to associate with various forms of arthritis and suggest that future studies may benefit from that by investigating the possible use of self-reporting/questionnaire along with metabolic markers for the early referral of patients for further diagnostic workup and treatment of arthritis. KEYWORDS: metabolomics, joint inflammation, diagnosis, prognosis, arthritis, rheumatoid arthritis

1. INTRODUCTION Musculoskeletal diseases are common worldwide, affecting millions of people, and are a major cause of disability.1 They include several disorders, the most common of which are © 2015 American Chemical Society

Received: October 12, 2015 Published: December 11, 2015 554

DOI: 10.1021/acs.jproteome.5b00951 J. Proteome Res. 2016, 15, 554−562

Article

Journal of Proteome Research

questions as “Yes”, “No”, and “Don’t know”, where “Don’t know” was considered missing information.

osteoarthritis (OA), rheumatoid arthritis (RA), gout, and fibromyalgia.2 They are also a challenging subject of epidemiologic study due to inherent difficulties in defining cases based on symptoms and in distinguishing among different rheumatic disorders. Early diagnosis, however, is important to prevent or decrease future disability,2 yet defining the exact time of onset is rather difficult.3 Early joint symptoms may or may not be related to the patient’s subsequent diagnosis, and the presence of evidence of systemic autoantibodies does not always predict progression to a specific rheumatic disease.4 Thus, metabolic profiling is suggested as a tool for improving diagnosis, early referral to the appropriate physician, and consequently prognosis of patients with musculoskeletal disorders.4a To date, studies that investigate metabolites associated with joint disease consider clinically diagnosed samples;4a,5 however, there are no studies that consider identifying metabolic changes in subjects with self-reported joint pain on a population level. Population studies may lack the phenotype consistency but increase the probability of positive findings in omics studies.6 For instance, the recent population study of self-reported chronic widespread musculoskeletal pain (CWP) did no further clinical phenotyping6 and identified steroid hormone metabolism abnormalities in the subjects enrolled. Finding metabolites associated with self-reported joint pain can potentially be an attractive method for early referral of patients to a physician for further diagnosis. The purpose of this study is to identify metabolites that significantly associate with self-reported joint symptoms (diagnostic analysis) and metabolites that can predict the change from symptom-free to the development of symptoms after a 7-year period (prognostic analysis). This study uses nontargeted metabolomics to investigate a large set of metabolites and the biochemical pathways associated with symptoms.

2.2. Blood Sampling

Blood for KORA F4 was drawn between 8:00 a.m. and 10:30 a.m. after 10 h of fasting. Material was drawn into serum gel tubes, gently inverted twice, rested for 30 min at room temperature to obtain complete coagulation, and then centrifuged for 10 min at 2750g. Serum was divided into aliquots and kept for a maximum of 6 h at 4 °C, after which it was stored at −80 °C until analysis. A similar blood draw protocol was used for KORA S4.9 2.3. Metabolomics Measurements

Metabolic profiling was done on serum using ultra-highperformance liquid chromatography and gas chromatography separation, coupled to tandem mass spectrometry (UHPLC− MS/MS and GC−MS, respectively) at Metabolon, Inc. (Durham, NC) using established procedures and technology.10 In brief, Metabolon is a commercial supplier of metabolic analyses on a platform integrating chemical analysis, including the identification and relative quantification, data reduction, and quality-assurance components of the process. Samples are submitted to three types of analyses: to positive- and negativemode UHPLC−MS/MS and GC−MS. The UHPLC injections were optimized for basic and acidic species. The resulting MS and MS/MS data were searched against a proprietary library generated by Metabolon that included retention time, molecular mass-to-charge ratio (m/z), and preferred adducts and in-source fragments for all molecules in the library. The library allowed for the identification of the experimentally detected molecules on the basis of a multiparameter match without the need for additional analysis. RSD (relative standard deviation) was determined using repeated measurements of technical replicates derived from pooled samples. Metabolomics measurements for KORA S4 and KORA F4 were performed in separate batches at Metabolon. Metabolites with >20% missing values or those that were detected only in either the S4 or the F4 samples were removed, resulting in a data set of 363 metabolites and 353 metabolites including unknowns (structurally unnamed metabolites identified by Metabolon) for each of S4 and F4, respectively. Missing data were imputed to the average over all valid observations of that metabolite. Metabolite concentrations were normalized on run day and log transformed. Some of the structurally unidentified metabolites (unknowns) at the time of measuring metabolites were later identified by Metabolon, as will be detailed later.

2. MATERIALS AND METHODS 2.1. Study Population

KORA (Cooperative Health Research in the Region of Augsburg) is an epidemiological research cohort with participants randomly selected from the general population in the region of Augsburg in Southern Germany.7 Here we use samples and data that were collected during the fourth survey (KORA S4) between 1999 and 2001 and from a follow-up study (KORA F4) that was conducted 7 years later between 2006 and 2008. Participants were between the ages of 54 and 75 years at baseline with an equal distribution of males and females. Details are provided elsewhere (ref 8 and references therein). All participants gave written informed consent. The studies were approved by the local ethics committees, the Bayerische Landesärztekammer. This study considers two types of self-reported symptoms that are determined solely based on previous knowledge of the subjects, without further validation. The symptoms are classified according to answering two questions as follows: “Did you have one of the following diseases during the last 12 months?” These included the following two questions regarding diseases of the joints: “Inflammatory joint disease, e.g., chronic poly arthritis” and “worn-out joints, arthritis in the joints of the hip, knee, shoulder or foot”. We refer to these questions as “inflammatory joint disease” (abbreviated JI) and “worn-out joints” (abbreviated WO)m respectively. The subjects answered those two

2.4. Statistical Analyses

A. Merged Cohort. After removing subjects with missing values or those who answered the questions with “don’t know”, S4 (N = 1483) and F4 (N = 1314) cohorts were merged for a higher statistical power (N = 2797) for diagnostic analysis. 551 subjects were present in both cohorts and were included once by selecting subjects evenly from both cohorts (50% from S4 and 50% from F4), thus avoiding duplication. That resulted in a total of 2246 subjects, with and without symptoms (Table 1), each with 311 metabolites. Data were z-score normalized for each cohort (S4 and F4) and for the merged cohort. B. Covariates Selection. We considered a set of 23 potential confounders (covariates) in addition to the batch effect (S4 and F4 as two different batches) to incorporate in the regression model (Table 2). For regressing metabolites against 555

DOI: 10.1021/acs.jproteome.5b00951 J. Proteome Res. 2016, 15, 554−562

Article

Journal of Proteome Research 3.2. Metabolites

Table 1. Number of Cases in Comparison Groups for Diagnostic and Prognostic Analysis symptom groups

diagnostic analysis

prognostic analysis

overlap(JI,WO): JI ∩ WO JI WO excluding JI: WO\JI JI+WO symptom-free (controls)

219 286 631 917 1329

10 16 57 73 204

A. 48 Metabolites Were Identified as Nominally Significantly Associated with Self-Reported Symptoms, 7 of Which Showed a Bonferroni Significance. Using linear regression on 311 metabolites after correcting for significant covariates, 48 metabolites (Table 3, and Supplemental Table S2) were identified as nominally significantly (p < 0.05) regressing with cases versus controls in any of the four comparison groups. Out of these, seven metabolites showed a Bonferroni significance (p < 0.05/311 = 1.6 × 10−4) and three metabolites showed a significant false discovery rate (FDR < 0.05) in at least one of the comparison groups. Metabolites with Bonferroni and FDR significance showed significant association with symptoms in all comparison groups, except for the group “WO excluding JI” (WO\JI). Sixteen metabolites out of the 48 identified metabolites are unknowns (3 of them have been identified later by Metabolon as shown in Table 3), 20 metabolites are lipids (out of 98 lipids), 8 metabolites are amino acids (out of 59 amino acids), and 4 metabolites are in different pathways: a cofactors and vitamins metabolite, a xenobiotic, a peptide, and a carbohydrate. The subpathways associated with the Bonferroni and FDR significant metabolites are the sterol/steroids subpathway (4 steroids and 4 unknowns, which are putative steroids, as explained later, or as recently identified by Metabolon) and tocopherol subpathway (one metabolite). The nominally significant metabolites are from different pathways: tryptophan (4 metabolites), fatty acids and carnitine (7 metabolites), lysolipids (3 metabolites and 1 unknown lately identified as lysolipid − X-12644), food components/plant (1 metabolite), and valine-leucine-isoleucine metabolism (1 metabolite). B. Steroids Are Prominent among the Metabolites That Are Bonferroni Significantly Regressing with Group JI. Four out of the seven metabolites that are identified as Bonferroni significantly regressing with the self-reported symptoms are already known to belong to the sterol/steroid subpathway, and the other three unknown metabolites (X12844, X-11444, and X-11470) are putative steroids or on the steroids pathway as discussed later. Also, 70% of the steroids in the whole data set (i.e., 7 out of 10 steroids) appear in the list of 48 metabolites, showing that the identified metabolite set is rich in steroids compared with the other pathways. These steroids are dehydroepiandrosterone sulfate (DHEA-S), epiandrosterone sulfate, androsterone sulfate, and 4-androsten-3-beta,17-beta-diol disulfate 2, being identified as Bonferroni significant, and another three metabolites, 4-androsten-3beta,17-beta-diol disulfate 1, cortisol, and cortisone being identified as nominally significant. Moreover, the unknown metabolite X-12844 has been previously found to associate with SNP rs2035647 near AKR1D1,10c which encodes an NADPHdependent delta 4-3-ketosteroid-5-beta-reductase that in turn catalyzes 5-beta-reduction of certain steroid molecules,12 indicating the possibility of being a steroid as well. Also, the Bonferroni significant unknowns X-11444 and X-11470 as well as the ratio of X-11444/X-12844 have been previously found to be associated with SNP rs559555, which is near SRD5A2,13 the genetic locus that encodes the steroid enzyme 5-alphareductase type 2. The FDR significant unknown X-11440 has been previously found to be associated with the SNP rs296381, which is near SULT2A1 (sulfotransferase family, cytosolic, 2A, dehydroepiandrosterone (DHEA)-preferring, member 1),13 which catalyzes the sulfation of steroids, and also identified later by Metabolon to be hydroxypregnen-diol disulfate or

cases versus controls, the set of covariates that are significant with respect to the outcome (case or control) are determined using stepwise regression in R. Selection was based on generalized linear regression models (step and glm functions in R statistical package). The stepwise regression is given the full list of all covariates and performs forward and backward steps to add and drop covariates so as to increase the AIC (Akaike Information Criterion11) measure. C. Regression Analysis. We carried both diagnostic and prognostic regression analyses while testing several symptoms combinations, then limited our results to the symptom groups that had significant results or had a great enough number of samples for comparison with controls. i. Diagnostic Analysis. Given the two groups JI and WO for the self-reported symptoms in the merged cohort, linear regression (using lm function in R) was used to identify metabolites associated with symptoms in the following groups compared with symptom-free subjects (controls), after correcting for significant covariates as previously described (see Supplemental Table S4 for selected covariates in each case): a-Cases with both self-reported symptoms, “overlap (JI, WO)”, or shortly “JI ∩ WO”, b-Cases of group “JI” (including overlap with those in WO), c-Cases of group WO (excluding those in JI), “WO excluding JI” or shortly “WO\JI”, d-All cases in JI or WO, “JI + WO” (Table 1). ii. Prognostic Analysis. After identifying the subjects who were symptom-free but developed any or both of the joint symptoms after 7 years, linear regression (using lm function in R) was used to compare four groups to controls after correcting for significant covariates as previously described. (See Supplemental Table S5 for selected covariates in each case.) S4 metabolite measures (363 metabolites) were used for the regression, while F4 was used to label the subjects in S4 by their change in status after 7 years. The four groups of subjects to compare with controls (those who stayed symptom-free in F4; 204 subjects) are those who changed from symptom-free in S4 to a state in F4 with a-Both self-reported symptoms, “overlap (JI,WO)” or shortly “JI ∩ WO”, b- “JI”, c-“WO excluding JI” or shortly “WO\JI”, d- “JI + WO” (Table 1).

3. RESULTS 3.1. Subjects’ Characteristics

Clinical and demographic characteristics of subjects in S4 and F4 in the diagnostic analysis are described in Table 2. These are considered as confounders (according to a covariates selection algorithm, as described in Methods) in the regression model that calculates the association of metabolites to the symptoms. t test p values and estimates resulting from comparing the values of these characteristics in cases and controls are also given in Supplemental Table S1. 556

DOI: 10.1021/acs.jproteome.5b00951 J. Proteome Res. 2016, 15, 554−562

Article

Journal of Proteome Research

Table 2. Subjects’ Characteristics for (a) Diagnostic and (b) Prognostic Analysis: Mean ± Standard Deviation (or Percentage for Categorical Variable Specified as % Inside Parentheses) of Covariates in Each of the Groups Used for Comparisona (a) N age gender (% males) BMI diabetes (% nondiabetic) alcohol (grams/day) smoking (% nonsmoker) physical activity (% active) systolic BP diastolic BP hip circumference (cm) waist circumference (cm) cholesterol (mg/dL) HDL LDL triglycerides (TRI) creatinine MCH MCHC MCV MPV PLT RBC WBC

N age gender (% males) BMI diabetes (% nondiabetics) alcohol (grams/day) smoking (% nonsmoker) physical activity (% active) systolic BP diastolic BP hip circumference waist circumference cholesterol (mg/dL) HDL LDL triglycerides (TRI) creatinine MCH MCHC MCV MPV PLT RBC WBC a

JI ∩ WO

JI

WO\JI

JI + WO

controls

219 63.5 ± 6.3 40.6 29.4 ± 4.7 92.2 16.2 ± 23.9 52.1 53.4 131.3 ± 18.5 78.4 ± 10.3 108.3 ± 10.8 97.8 ± 12.7 233.2 ± 41.8 57.2 ± 15.4 147.9 ± 39.3 140.0 ± 74.2 0.9 ± 0.2 31.0 ± 1.8 338.3 ± 8.2 91.5 ± 4.5 8.7 ± 0.9 246.4 ± 65.0 4.6 ± 0.4 6.1 ± 1.6

286 63.5 ± 6.5 42.0 29.0 ± 4.7 91.6 15.8 ± 22.8 49.7 51.4 131.4 ± 18.6 78.0 ± 10.0 107.5 ± 10.3 96.9 ± 12.9 233.6 ± 41.6 56.7 ± 15.1 147.6 ± 38.5 144.9 ± 87.8 0.9 ± 0.2 31.0 ± 1.7 337.9 ± 7.7 91.7 ± 4.3 8.7 ± 0.9 243.4 ± 62.8 4.6 ± 0.4 6.1 ± 1.7 (b)

631 63.2 ± 6.6 47.1 29.0 ± 4.7 90.3 14.7 ± 18.1 46.3 52.0 132.0 ± 20.7 78.4 ± 10.5 107.7 ± 9.1 97.2 ± 12.2 234.7 ± 40.3 56.9 ± 15.4 148.1 ± 37.3 145.5 ± 95.0 0.9 ± 0.4 31.0 ± 1.7 337.7 ± 7.3 91.8 ± 4.3 8.8 ± 1.0 234.5 ± 57.8 4.6 ± 0.4 6.1 ± 1.7

917 63.3 ± 6.6 45.5 29.0 ± 4.7 90.7 15.0 ± 19.7 47.3 51.8 131.8 ± 20.0 78.3 ± 10.3 107.6 ± 9.5 97.1 ± 12.5 234.4 ± 40.7 56.8 ± 15.3 148.0 ± 37.6 145.3 ± 92.7 0.9 ± 0.4 31.0 ± 1.7 337.7 ± 7.4 91.8 ± 4.3 8.8 ± 0.9 237.3 ± 59.5 4.6 ± 0.4 6.1 ± 1.7

1329 60.6 ± 7.5 54.9 27.8 ± 4.3 93.7 17.2 ± 21.6 44.9 51.0 130.6 ± 21.2 79.2 ± 10.7 105.7 ± 8.5 94.9 ± 13.0 231.1 ± 42.5 57.0 ± 15.6 146.1 ± 38.2 134.1 ± 91.7 0.9 ± 0.2 31.1 ± 1.7 338.4 ± 7.9 91.8 ± 4.4 8.8 ± 1.0 241.4 ± 60.7 4.6 ± 0.4 6.1 ± 1.7

JI ∩ WO

JI

WO\JI

JI + WO

controls

10 60.7 ± 2.5 60.0 ± 0.0 28.9 ± 4.4 100.0 ± 0.0 20.1 ± 23.2 70.0 ± 0.0 50.0 ± 0.0 130.7 ± 16.4 82.9 ± 8.9 105.4 ± 5.3 96.3 ± 13.5 246.2 ± 56.4 69.7 ± 21.3 148.8 ± 53.8 117.5 ± 67.4 0.8 ± 0.1 30.6 ± 1.6 334.4 ± 3.6 91.4 ± 5.0 8.9 ± 0.5 225.1 ± 58.7 4.7 ± 0.5 6.1 ± 1.4

16 60.1 ± 2.5 50.0 ± 0.0 28.7 ± 3.7 93.8 ± 0.0 18.2 ± 21.8 68.8 ± 0.0 37.5 ± 0.0 134.1 ± 17.3 82.9 ± 7.3 104.6 ± 6.1 95.5 ± 12.4 244.7 ± 47.2 64.2 ± 18.9 152.3 ± 46.3 145.3 ± 126.8 0.8 ± 0.2 31.1 ± 1.6 336.0 ± 6.0 92.5 ± 5.1 8.7 ± 0.7 232.9 ± 50.0 4.6 ± 0.4 6.2 ± 1.5

57 59.2 ± 2.7 56.1 ± 0.0 27.8 ± 4.3 98.2 ± 0.0 18.4 ± 22.6 42.1 ± 0.0 54.4 ± 0.0 132.1 ± 21.4 80.2 ± 10.9 106.0 ± 7.1 94.6 ± 12.9 245.7 ± 45.2 56.8 ± 16.4 157.4 ± 45.1 139.3 ± 93.0 0.9 ± 0.1 30.9 ± 1.3 336.2 ± 5.7 91.8 ± 3.6 8.6 ± 1.0 225.7 ± 51.5 4.7 ± 0.3 5.9 ± 1.3

73 59.4 ± 2.6 54.8 ± 0.0 28.0 ± 4.2 97.3 ± 0.0 18.4 ± 22.3 47.9 ± 0.0 50.7 ± 0.0 132.5 ± 20.5 80.8 ± 10.3 105.7 ± 6.9 94.8 ± 12.8 245.5 ± 45.3 58.4 ± 17.1 156.4 ± 45.1 140.7 ± 100.6 0.9 ± 0.1 30.9 ± 1.4 336.2 ± 5.7 92.0 ± 3.9 8.7 ± 1.0 227.3 ± 50.9 4.7 ± 0.4 6.0 ± 1.4

204 59.1 ± 2.8 56.9 ± 0.0 27.2 ± 3.6 96.1 ± 0.0 18.0 ± 21.1 48.5 ± 0.0 51.0 ± 0.0 131.5 ± 20.5 81.1 ± 11.3 104.2 ± 7.0 93.6 ± 11.3 246.4 ± 43.2 57.9 ± 16.2 154.5 ± 42.7 138.8 ± 83.3 0.9 ± 0.2 31.0 ± 1.6 334.8 ± 8.2 92.5 ± 4.1 8.8 ± 0.9 222.3 ± 52.5 4.7 ± 0.5 6.1 ± 1.7

Where needed, units of measurement are specified in parentheses.

pregnanolone-diol disulfate. The other FDR significant unknown X-18601 has been later identified by Metabolon as 4-androsten-3-beta-17-beta-diol-monosulfate (1), that is, belonging to the steroid pathway as well. The Bonferroni and FDR significant metabolites showed their highest significant decrease in subjects with symptoms versus controls for all

groups except the WO/JI group, in which they showed fewer (p value