1H Nuclear Magnetic Resonance Metabolomics Analysis Identifies

Apr 21, 2010 - University, Richmond, Virginia 23298, Pacific Northwest National ... Virginia 23219, and Eccles Institute of Human Genetics, University...
2 downloads 0 Views 149KB Size
1

H Nuclear Magnetic Resonance Metabolomics Analysis Identifies Novel Urinary Biomarkers for Lung Function

Joseph L. McClay,*,† Daniel E. Adkins,† Nancy G. Isern,‡ Thomas M. O’Connell,§ Jan B. Wooten,† Barbara K. Zedler,| Madhukar S. Dasika,⊥ Bradley Todd Webb,|,¶ Bobbie-Jo Webb-Robertson,‡ Joel G. Pounds,‡ Edward L. Murrelle,| Mark F. Leppert,4 and Edwin J. C. G. van den Oord† Center for Biomarker Research and Personalized Medicine, School of Pharmacy, Virginia Commonwealth University, Richmond, Virginia 23298, Pacific Northwest National Laboratory, Richland, Washington 99352, Division of Pharmacotherapy and Experimental Therapeutics, School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27709, Venebio Group, Virginia Biotechnology Research Park, Richmond, Virginia 23219, ClearPoint Resources, Virginia Biotech Research Park, Richmond, Virginia 23219, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, Virginia 23219, and Eccles Institute of Human Genetics, University of Utah, Salt Lake City, Utah 84112 Received January 4, 2010

Chronic obstructive pulmonary disease (COPD), characterized by chronic airflow limitation, is a serious public health concern. In this study, we used proton nuclear magnetic resonance (1H NMR) spectroscopy to identify and quantify metabolites associated with lung function in COPD. Plasma and urine were collected from 197 adults with COPD and from 195 without COPD. Samples were assayed using a 600 MHz NMR spectrometer, and the resulting spectra were analyzed against quantitative spirometric measures of lung function. After correcting for false discoveries and adjusting for covariates (sex, age, smoking) several spectral regions in urine were found to be significantly associated with baseline lung function. These regions correspond to the metabolites trigonelline, hippurate and formate. Concentrations of each metabolite, standardized to urinary creatinine, were associated with baseline lung function (minimum p-value ) 0.0002 for trigonelline). No significant associations were found with plasma metabolites. Urinary hippurate and formate are often related to gut microflora. This could suggest that the microbiome varies between individuals with different lung function. Alternatively, the associated metabolites may reflect lifestyle differences affecting overall health. Our results will require replication and validation, but demonstrate the utility of NMR metabolomics as a screening tool for identifying novel biomarkers of pulmonary outcomes. Keywords: chronic obstructive pulmonary disease • lung • nuclear magnetic resonance • metabolomics • urine • biomarker

Introduction Chronic obstructive pulmonary disease (COPD) is the fourth leading cause of chronic morbidity and mortality in the United States and is projected to become the third leading cause of death worldwide by 2020.1,2 Cigarette smoking is recognized as the most important environmental causative factor for COPD.1-3 It is estimated that up to 50% of smokers may * To whom correspondence should be addressed. Dr. Joseph L McClay, Ph.D., McGuire Hall, 1112 East Clay Street, Medical College of Virginia Campus, Virginia Commonwealth University, Richmond, VA, 23298, e-mail: [email protected]. † CBRPM, Virginia Commonwealth University. ‡ Pacific Northwest National Laboratory. § University of North CarolinasChapel Hill. | Venebio Group, LLC. ⊥ ClearPoint Resources. ¶ VIPBG, Virginia Commonwealth University. 4 University of Utah. 10.1021/pr1000048

 2010 American Chemical Society

eventually develop COPD, as defined by the Global Initiative for Chronic Obstructive Lung Disease (GOLD) spirometric criteria.1,4,5 COPD is characterized by incompletely reversible airflow limitation that results from small airway disease (obstructive bronchiolitis) and parenchymal destruction (emphysema). These pathologic changes are the result of an abnormal inflammatory response to long-term exposure to noxious gases or particles, with structural changes due to repeated injury and repair.2 The mechanisms of the enhanced inflammation that characterizes COPD involve both innate and adaptive immunity in response initially to inhalation of particles and gases.6 Several studies have demonstrated differences in markers of inflammation and immune response, such as a correlation between the number of CD8 cytotoxic T lymphocytes and the degree of airflow limitation in COPD.7 The response to oxidative stress is considered an important factor in the pathogenesis of COPD,8 while protease-antiprotease imbalance is thought to Journal of Proteome Research 2010, 9, 3083–3090 3083 Published on Web 04/21/2010

research articles

McClay et al. a

Table 1. Demographic, Smoking History and Lung Function Characteristics of the Subjects

subjects without COPD (n ) 195) characteristic

subjects with COPD (n ) 197)

smokers (n ) 90)

never-smokers (n ) 105)

Male, n (%) Age, mean (SD) BMI (kg/m2), mean (SD) Cigarettes per Day, mean (SD)b Years Smoked, mean (SD) Pack-Years, mean (SD) FEV1 (L), mean (SD) FEV1 %predicted, mean (SD) FEV1/FVC, mean (SD)

129 (66) 64.7 (6.20) 28.6 (3.53) 20.4 (11.84) 38.0 (9.97) 57.5 (30.13) 2.15 (0.658) 69.4 (17.09) 55.6 (11.55)

42 (47) 57.2 (6.96) 28.5 (7.05) 19.5 (11.74) 32.5 (12.24) 45.1 (27.85) 3.05 (0.735) 101.8 (14.13) 75.9 (5.67)

50 (48) 56.8 (8.13) 29.5 (7.72) n/a n/a n/a 3.19 (0.856) 102.3 (16.46) 77.5 (7.94)

a COPD, chronic obstructive pulmonary disease; BMI, body mass index; FEV1, forced expiratory volume in 1 s; FVC, forced ventilatory capacity; n/a, not applicable. Pack-years ) (average cigarettes smoked per day/20) × (years of smoking). b Daily cigarette consumption of continuing smokers. At the time of biosampling for metabolomic analysis, 114/180 (63%) of the smokers with COPD and 42/87 (48%) of the smokers without COPD had quit smoking (percentages based on nonmissing responses).

be associated with emphysema.9 However, while inflammation and other factors are clearly involved in the molecular pathogenesis of COPD, the precise etiological mechanisms remain to be fully characterized. Recent advances in laboratory technologies have enabled “omics” investigations that allow researchers to interrogate diseases on an unprecedented scale.10 A rapidly developing discipline in this area is metabolomics, which is the study of all measurable metabolites in a given biological sample.11 Metabolomics studies typically employ either mass spectrometry or nuclear magnetic resonance (NMR) to assay the biochemical components of a tissue or biofluid. Proton (1H) NMR metabolomics has already shown considerable potential as a diagnostic tool for Alzheimer’s disease,12 diabetes and related disorders13,14 and inborn errors of lipid metabolism.15 Furthermore, 1H NMR metabolomics can facilitate the discovery of novel candidate biomarkers of disease risk, as demonstrated by Holmes et al.16 in a large population study of blood pressure. Considering pulmonary studies specifically, de Laurentiis et al.17 found that NMR-based metabolic profiles of exhaled breath condensate enabled discrimination between individuals with and without COPD. While this study is the only one to date to investigate COPD using a metabolomics approach, other disorders also chacterized by inflammation have been successfully studied using NMR, including inflammatory bowel disease18 and rheumatoid arthritis.19 Factors influencing lung function and its decline include a complex mixture of environmental and biological factors, especially cigarette smoking, gender and age.20-22 In the presence of such etiological complexity, conventional analytical strategies, such as using COPD-affected/unaffected status or simple spirometric measurements alone, are severely limited. In this study, we therefore employed mixed modeling of longitudinal spirometric data collected as part of the NIHsponsored Lung Health Study (LHS) and its follow-on Genetics of Addiction Project (GAP). This allowed us to derive phenotypic measures that disentangled the major factors affecting lung function. We then performed metabolomics analysis using high resolution NMR of plasma and urine from a subset of 392 LHS-GAP study participants with or without COPD, and related spectra to the lung function phenotypic measures. Our intention was to identify metabolites correlated with lung function, to increase etiological understanding of COPD and identify biomarkers of lung function and lung function decline. 3084

Journal of Proteome Research • Vol. 9, No. 6, 2010

Materials and Methods Subjects. Participants in the Lung Health Study (LHS) and Genetics of Addiction Project (GAP) at the University of Utah study center were selected to be part of the study. LHS was a prospective, randomized, multicenter clinical study sponsored by the National Heart, Lung, and Blood Institute (NHLBI) that enrolled male and female otherwise healthy cigarette smokers, aged 35-60 years, with mild or moderate COPD during 1986-1989.23 Lung function was measured by spirometry at baseline, annually for 5 years, and once during 1998-2001. GAP was a follow-up study during 2003-2004 in which spirometry was performed, smoking status was assessed and plasma and urine samples for metabolomic analysis were obtained. Diet, exercise and alcohol intake were not assessed. We selected for this study 197 subjects with COPD who had participated in LHS and GAP and, for comparison, 195 subjects without COPD (90 cigarette smokers and 105 never-smokers) enrolled in GAP. Characteristics of the study groups are shown in Table 1. At the GAP assessment time point, 63% of the smokers with COPD and 48% of the smokers without COPD had quit smoking. Measures of Lung Function and Decline. The outcomes in this analysis were four measures of lung function or decline in lung function measured spirometrically as forced expiratory volume in one second (FEV1).24 These four measures were derived by fitting mixed models to longitudinal spirometric, smoking and demographic data obtained over the subjects’ 17year average participation period in the LHS and GAP. Conceptually, three of these measures represent different underlying biological processes driving lung function decline. We focused on age-related decline (Age decline), cigarette smokingrelated decline, as measured in pack-years (Pack-years decline), and the intensifying effects of smoking, in terms of number of cigarettes per day (CPD), on decline with increasing age (CPD × Age decline). Together these three measures accounted for the vast majority of individual differences in lung function decline in these subjects. In addition, we included a measure of Baseline lung function at subjects’ entry into the study as an outcome as it has also been shown to vary considerably in magnitude among individuals.25 The complete technical details for the derivation of these measures are provided in the Supporting Information. Blood and Urine Sampling and Preparation. Blood samples (4 mL) were obtained from each subject, at least two hours after eating, by venipuncture using a sodium citrated Vacutainer tube. Within ten minutes of collection, blood was

Urinary Metabolomics and Lung Function centrifuged at 1500× g and 2-6 °C for 15 min. The topmost plasma was removed and centrifuged at 1500× g for 15 min more. Plasma samples were shipped on dry ice, stored at -80 °C, thawed just before analysis, and prepared in an adaptation of the method described by Tang et al.26 The NMR buffer used contained isotonic phosphate buffered saline (NERL Diagnostics, East Providence, RI), 20% D2O, and 0.02% sodium azide. A 300 µL aliquot of each sample was diluted with 300 µL of prepared NMR buffer for a final concentration of 10% D2O and 0.01% sodium azide. 2,2′,3,3′-deuterotrimethylsilylproprionic acid (TSP) was not used as an internal reference, due to interaction-induced baseline broadening that occurs in plasma samples.26 All samples were placed in a 5 mm NMR tube. A 4 mL aliquot was obtained from a nonfirst morning urine sample from each subject. Urine samples were shipped on dry ice, stored at -80 °C, and thawed just before analysis. A pH 7.2 buffer was prepared containing 200 mM sodium phosphate (Sigma-Aldrich, St. Louis, MO), 0.25 mM TSP (Sigma-Aldrich, St. Louis, MO), 0.025% sodium azide (Sigma-Aldrich, St. Louis, MO), and 25% deuterium oxide (D2O; 99.9%, low paramagnetic, Cambridge Isotope Laboratories, Andover, MA). A 360 µL aliquot of urine was diluted with 240 µL of buffer to minimize pH variations. The final buffer concentration of the prepared samples was 80 mM sodium phosphate, 10% D2O, 0.01% sodium azide, and 0.1 mM TSP. All samples were placed in 5 mm NMR tubes. NMR Methods. NMR spectra were acquired on a Varian Inova 600 spectrometer (Varian Inc., Palo Alto, CA) at the Pacific Northwestern National Laboratory. The spectrometer system was outfitted with a Varian triple resonance cold probe and linked to a Sun Ultra10 (Santa Clara, CA) workstation. For each urine sample, a one-dimensional presaturation sequence with gradients (Varian “metab” pulse sequence) was collected. Final spectra were accumulations of 32 individual free-induction decays (FIDs). Each FID was obtained using a nonselective, 90-degree excitation pulse and digitized into 32K complex data points. A total interexcitation pulse delay of 3.0 s was used, initiated by a gradient homogeneity spoil pulse to destroy residual transverse magnetization. The spectral width of 6980.8 Hz resulted in an acquisition time of 2.347 s for a total recycle time of 5.3 s. For plasma samples, we used the Carr-Purcell-Meiboom-Gill (CPMG) pulse sequence with solvent presaturation in order to selectively suppress the spectral interferences from macromolecules such as proteins and lipoprotein particles. We focused on this method because we were interested in the small molecule metabolites, which would otherwise be obscured by the broadening caused by macromolecules using standard pulse sequences. A total spin-echo time of 50 ms was used in the CPMG sequence. The data were accumulations of 32 FIDs. Each FID was induced using a nonselective, 90-degree excitation pulse and digitized into 32 K complex data points. The total interexcitation pulse delay was 3.0 s. The spectral width of 7197.5 Hz resulted in an acquisition time of 2.276 s for a total recycle time of 5.3 s. Data Processing. NMR spectra were processed using Advanced Chemistry Development (ACD) 1D NMR Processor 11.0 (Advanced Chemistry Development Inc., Toronto, Canada). The ACD macro included a Fourier transform followed by automated phasing and baseline correction. Quality control was applied such that spectra with severely distorted baselines or poor water suppression were rejected. The regions below 0.5 ppm and above 9.0 ppm were excluded as they contained

research articles limited spectral information. For plasma spectra, the region from 4.2 to 6.5 ppm was excluded to eliminate the residual water signal. For urine spectra, the region from 4.2 to 6.8 ppm was excluded to eliminate the signals from both the residual water and urea. Integration of spectra was performed using the intelligent binning algorithm in ACD 11.0, whereby bin widths were set to 0.04 ppm with a tolerance of 50% to allow for better bin positioning relative to observed peaks. Statistical Analyses. Ordinary least-squares (OLS) regression was used to test associations between NMR spectrum region integrals (bins) and the measures of lung function or lung function decline as measured by FEV1. In the first set of analyses the following procedure was used for each of the four lung function measures (see Section 2.2) and the FEV1 residual, i.e., FEV1 adjusted for the covariates of the best-fitting model. A separate simple regression was estimated for each bin, with the bin predicting the lung function measure. In the second set of analyses, in order to establish that significant associations were not spurious due to other factors known to influence metabolomic profiles, bin values were adjusted for potential confounders (i.e., age, gender, smoking status and average cigarettes per day) using OLS regression. More specifically, for each bin, a regression was fitted predicting the bin with age, gender, smoking status and average cigarettes per day. After fitting this model estimates were used to calculate residuals. Conceptually these residuals are equal to the bin value minus the effects of age, gender, smoking status and average cigarettes per daysthis ensures that observed associations are due to lung function parameters and not the confounding influence of age, gender or smoking status. The procedure described above was repeated on these adjusted measures. After identifying robustly significant associations between bins and lung function measures, the compounds represented by the significant bins in urine were identified and quantified as described below. In addition to calculating concentrations for specific compounds, we also calculated standardized concentrations as ratios of the identified compounds to creatinine, in order to control for total metabolite concentration within each spectrum. In a third specification, we adjusted the creatinine-standardized concentrations for age, gender and smoking. For each urine compound of interest, associations between all concentration specifications and the appropriate lung function measure were tested using OLS regression. As a final test of robustness, the sample was stratified into neversmokers and smokers (i.e., continuing and former cigarette smokers), and associations of compound concentrations to the appropriate lung function measure were examined in each group. Statistical analyses were carried out using Stata 9.2 (http://www.stata.com). To control the risk of false discoveries, we calculated for each p-value a so-called q-value.27,28 A q-value is an estimate of the proportion of false discoveries among all significant markers (i.e., q-values are false discovery rates or FDRs) when the corresponding p-value is used as the threshold for declaring significance. This FDR-based approach is preferred over traditional methods of multiple testing correction29 because it (a) provides a good balance between the competing goals of finding true effects versus controlling false discoveries, (b) allows the use of more similar standards in terms of the proportion of false discoveries produced across studies because it is much less affected by the number of (sets of) tests, which is an arbitrary factor, (c) is relatively robust against having correlated tests27,29-36 and (d) rather than an all-or-nothing Journal of Proteome Research • Vol. 9, No. 6, 2010 3085

research articles

McClay et al.

Table 2. Number of Significant Associations between Spectral Bins and FEV1 Outcome Measures for Urine and Plasma at Various q-Value Thresholds urine (148 bins) outcome measure Baseline lung function Age Decline Pack-years Decline CPD × Age Decline FEV1 residual

plasma (177 bins)

urine (148 bins)

q < 0.5 q < 0.3 q < 0.1 q < 0.5 q < 0.3 q < 0.1 30 0 1 6 0

13 0 1 0 0

12 0 0 0 0

0 0 0 0 12

0 0 0 0 0

0 0 0 0 0

conclusion about whether a study produces significant results, gives a more subtle picture of the possible role of the tested markers. Identification of Compounds in Urine. To identify the specific compounds in bins associated with the lung function measures, we inspected spectra using the Chenomx NMR Suite 5.0 (Chenomx Inc., Edmonton, Alberta, Canada). Compounds with signals in the spectral regions of the significant bins were examined and the assignments were made by quantitative fitting of the experimental spectra with reference spectra from the database using the Chenomx NMR Profiler software. Concentrations of highlighted compounds in urine spectra were assigned using the 0.1 mM TSP peak as a chemical shape and concentration reference. Concentrations of creatinine were also determined using this method. All peaks for profiled compounds, including both creatinine peaks (∼3 and 4 ppm), were matched simultaneously: the Chenomx software automatically alters the size of the all predicted peaks for a given compound proportionally and simultaneously. This necessitates that all peaks are in accordance with the same concentration (and not independent).

Results Examination of Spectral Integrals (bins). Each spectrum was checked for quality; 359 urine spectra (91.6%) and 362 T2 CPMG plasma spectra (92.3%) passed quality control. Those that failed were largely due to poor water suppression causing problems with phasing. Intelligent binning yielded 148 bins per spectrum for urine and 177 for plasma. This difference in bin numbers is mainly due to the exclusion of the urea region in the urine samples. Of the 148 bins generated for the urine spectra, several showed highly significant association with Baseline lung function in the first set of analyses. As shown in Table 2, 12 bins were significant at q-value