Use of Urine Volatile Organic Compounds To Discriminate

May 27, 2011 - Additional information as noted in text. This material is ... Environmental Science & Technology 2013 47 (14), 7876-7882. Abstract | Fu...
0 downloads 0 Views 2MB Size
ARTICLE pubs.acs.org/ac

Use of Urine Volatile Organic Compounds To Discriminate Tuberculosis Patients from Healthy Subjects Khalid Muzaffar Banday,† Kishore Kumar Pasikanti,‡ Eric Chun Yong Chan,‡ Rupak Singla,§ Kanury Venkata Subba Rao,† Virander Singh Chauhan,*,|| and Ranjan Kumar Nanda*,† †

Immunology Group, International Center for Genetic Engineering and Biotechnology, New Delhi, India 110067 Department of Pharmacy, Faculty of Science, National University of Singapore, 18 Science Drive 4, Singapore 117543 § Department of Tuberculosis and Respiratory Diseases, Lala Ram Sarup Institute of Tuberculosis and Respiratory Diseases, New Delhi, India 110030 Malaria Group, International Center for Genetic Engineering and Biotechnology, New Delhi, India 110067

)



bS Supporting Information ABSTRACT: Development of noninvasive methods for tuberculosis (TB) diagnosis, with the potential to be administered in field situations, remains as an unmet challenge. A wide array of molecules are present in urine and reflect the pathophysiological condition of a subject. With infection, an alteration in the molecular constituents is anticipated, characterization of which may form a basis for TB diagnosis. In the present study volatile organic compounds (VOCs) in human urine derived from TB patients and healthy controls were identified and quantified using headspace gas chromatography/mass spectrometry (GC/MS). We found significant (p < 0.05) increase in the abundance of o-xylene (6.37) and isopropyl acetate (2.07) and decreased level of 3-pentanol (0.59), dimethylstyrene (0.37), and cymol (0.42) in TB patients compared to controls. These markers could discriminate TB from healthy controls and related diseases like lung cancer and chronic obstructive pulmonary disorder. This study suggests a possibility of using urinary VOCs for the diagnosis of human TB.

T

uberculosis (TB) is an infectious disease caused by Mycobacterium tuberculosis (M tb). TB is the topmost infectious disease with more than 10 million new cases and 3 million deaths reported each year. In addition, approximately one-third of the world population is infected with M tb (latent TB). Nearly 95% of all cases and 98% of deaths due to TB occur in developing countries.1 The situation has become more alarming in recent years with the wide spread of HIV infection which decreases the immunity of the subjects and facilitates conversion of latent TB to active TB.2 Improper disease management by misuse or incomplete drug treatment may result in the development of multiple drug resistance (MDR) and extensively drug-resistant (XDR) tuberculosis strains.3 In developing countries, 25% of the avoidable death cases are contributed by TB.4 Early diagnosis of TB can decrease its fatality rate and reduce further transmission of the disease. Current diagnosis of TB still relies on a 120 year old simple and inexpensive acid-fast bacillus (AFB) sputum test in spite of the requirement of high technical skills to perform the test.5 Serological tests are invasive, have low sensitivity in smear-negative patients and BCG vaccinated populations, and hold less promise in disease-endemic countries.6 Furthermore, identification of MDR and XDR TB requires an expensive and sophisticated culture test which requires more than 2 weeks of analysis, a costintensive system, and trained manpower.7,8 In these countries, r 2011 American Chemical Society

the preferred choice of diagnosis would be a simple methodology which requires minimal resources, short analysis time, easy result interpretation, and minimal training requirements. Therefore, the development of a noninvasive method that can be used in endemic countries for the diagnosis of TB is pertinent. In this study, we investigated the applicability of urine as a matrix for the differentiation of TB patients from healthy controls using the global metabolic profiling approach. Volatile organic compounds (VOCs) present in human urine were profiled to investigate alteration in abundance of VOCs of TB patients in comparison to healthy controls. Examining the perturbations in metabolites provided an insight into the modification of the metabolic state of the host related to TB infection, and this might aid in the development of a diagnostic tool for disease identification. The workflow adopted in this metabolic profiling study is summarized in Figure 1.

’ EXPERIMENTAL SECTION Subject Recruitment. Informed consent was obtained from all study subjects after oral and written information related to the Received: February 2, 2011 Accepted: May 20, 2011 Published: May 27, 2011 5526

dx.doi.org/10.1021/ac200265g | Anal. Chem. 2011, 83, 5526–5534

Analytical Chemistry

ARTICLE

Figure 1. Experimental workflow adopted in the urine volatile organic compounds study of tuberculosis patients and healthy controls using headspace gas chromatography mass spectrometry: /, subjects are a subset of the training set of discovery stage-II; 3M, 3 months of treatment; 7M, 7 months of treatment; COPD, chronic obstructive pulmonary disease.

project was provided. The subjects presenting with cough for more than 3 weeks with or without other constitutional symptoms, which include expectoration, hemoptysis, breathlessness, fever, and weight loss, in the out ward patient department (OPD) of Lala Ram Sarup Institute of Tuberculosis and Respiratory Diseases, New Delhi, India (LRS) were taken as TB suspects. The sputum samples were collected as spot specimens over a period of 3 days and stained by ZiehlNeelsen stain. Only subjects who had never taken any anti-TB therapy prior, and at least two of their sputum specimens if found positive for acid-fast stain, were included as fresh TB cases in this study. Patients with other diseases or coinfection, HIV-positive subjects, infants up to the age of 14 years, and pregnant women were excluded from this study. A total of 117 fresh TB patients were recruited in our study. Fresh patient cohorts under different treatment periods were recruited in this study, early (13 months; n = 15) and late treatment (47 months; n = 5) subjects. Urine samples from healthy controls (H) with no history of TB or other chronic disease from the International Center for Genetic Engineering and Biotechnology (ICGEB), New Delhi were collected in a similar fashion {H (PPD ve); n = 37}. Family member of the TB patients sharing the same premises and spending more than 10 h every day with the index subject for the last 2 years with

purified protein derivative test (PPD)-positive status were included as healthy PPD-positive controls {H (PPD þve); n = 19}. The necessary medical check-up and tests of healthy PPD þve and PPD-negative (PPD ve) subjects were carried out. The clinical details of the recruited subjects from each group are summarized in Table 1. Approvals from the ethical committee of LRS hospital and ICGEB were taken and followed for collection and handling of the clinical samples. As we recruited subjects from more than one institution in a continuous basis it was impossible to get exactly matching age and gender populations for all study groups. Spot midstream urine samples from all the recruited subjects were collected from the respective institutions. Comparison of TB with Other Lung Diseases. Patients with similar pulmonary diseases like lung cancer (n = 7) or chronic obstructive pulmonary disease (n = 5) were also recruited from the LRS hospital, New Delhi. These patients were thoroughly assessed for their TB status by chest X-ray and PPD test. The demographic details are presented in Table 1. Sample Storage and Optimization of Sample Processing. The collected samples were stored and transported at 4 °C to the sample bank at ICGEB. The samples were analyzed preferentially on the same day of sample collection, or else stored with the addition of protease inhibitors (per 50 mL); 33 μL of 100 mM 5527

dx.doi.org/10.1021/ac200265g |Anal. Chem. 2011, 83, 5526–5534

a H, healthy controls; PPD, purified protein derivative test (Mantoux test); COPD, chronic obstructive pulmonary disease; no info., information not available. b Subjects are a subset of the cohort used in the stage-II training set. c Subjects who are regular smokers and also consume tobacco.

/5/ 3/2 1/4 4/1 //5 //5 /5/ 1/6/ 5/2 4/3 6/1 7// 7// /7/ 5// 4/1 4/1 3/2 5// 3/2/ 5// 13/2/ 13/2 12/3 11/4 11/4/ 8/7/ 15// 15// /15 /15 /15 //15 //15 //15 30// 30/ 22/8 20/10 28/2/ 20/10/ 30// /9/ /9 /9 /9 //9 //9 //9 29// 29/ 19/10 21/8 26/3/ 21/08/ 29// /17/ /17 /17 /17 //17 //17 //17 55//3 58/ 55/14 51/7 48/5/5 49/7/2 58// 19// 1/18 /19 1/18 /19/ /19/ /19/ 30// 30/ 24/6 21/9 30// 23/7/ 30// nonsmokers/quit/tobacco) PPD (þve/ve/no info.) cough (yes/no) expectoration (yes/no) chest pain (yes/no) abnormal X-ray (yes/no/no info.) cavity (yes/no/no info.) smear (þve/ve/no info.)

/11/ /11 /11 /11 //11 //11 //11

7 months 5 34 (2250) 5/ 19 2/3// 3 months 15 33 (1652) 11/4 19 10/5// H (PPD ve) 9 27 (2532) 9/ 24 2/7// TB 29 27 (1739) 23/6 19 8/15/5/1 H PPD ve) 17 27 (2445) 17/ 23 3/14// TB 58 32 (1768) 48/10 19 28/21/5/4 H (PPD þve) 19 34 (2265) 16/3 21 9/8/1/1 TB 30 34 (2060) 25/5 19 17/9/3/2c

H (PPD ve) 11 26 (2428) 11/ 25 /11//

ARTICLE

subject groups total no. of subjects age (years) gender (M/F) body mass index smokers (active/

testing discovery stage-I

training

discovery stage-II

Table 1. Demographic Table: Distribution of Subjects Across Discovery and Validation Stagesa

TBb 30 25 (2060) 22/8 19 19/10/1/2c

Hb (PPD ve) 15 29 (2435) 15/ 25 3/12//

validation stage

lung cancer 7 50 (2866) 7/ 23 /7//

COPD 5 41 (3155) 5/ 23 /5//

Analytical Chemistry

sodium azide, 500 μL of phenylmethylsulfonyl fluoride (PMSF; 2%), and 1 μL of leupeptin (100 mM). The processed samples were stored at 80 °C for future analysis. The sample storage parameters like effect of addition of proteases, storage time period, and alteration of pH were optimized in detail before undertaking data collection. Aliquots of a single subject urine sample at different storage time periods (2 h to 30 days) from the time of sample collection were carried out for gas chromatography/mass spectrometry (GC/MS) data acquisition and analysis. Urine pH was altered from 2 to 8 (normal urine pH ∼ 5.0) with the addition of HCl (13 N) and NaOH (1 M). Optimization of the volume of HCl to be added in the urine sample was carried out by adding different volumes of HCl (5, 10, 15, 20, 25, and 50 μL/mL). GC/MS spectra were collected from each sample after randomizing the sample sets using MATLAB (R2008a, MathWorks, U.S.A.) to remove biasness and overfitting of the data. The samples were coded prior to data acquisition, and the analyst was blinded of the sample information. Method Reproducibility and Accuracy. The reproducibility of the static headspace sampling GC/MS method was established by injecting a healthy (PPD ve, age 27, male) urine sample at three different days (first, third, and fifth). Accuracy of the method was determined by collecting GC/MS data of urine samples belonging to the healthy group (PPD ve; n = 5) and equal volume of a mixture of three standard solvents (tetradecane, pentadecane, and hexadecane at 0.218 g/L) five consecutive times on the same day. The adopted GC/MS method for solvent data acquisition is described in the Supporting Information. GC/MS Data Acquisition. Two discovery stages were employed in this study for the identification of potential urine VOC markers in TB patients. The head space and GC/MS instrument parameters were standardized independently for the discovery stages (Agilent Technologies, U.S.A.). The detailed instrument parameters are available in the Supporting Information. After collection of initial data in discovery stage-I, machine parameters were further optimized and applied in stage-II. A set of controls were analyzed in the validation stage, where patients undergoing treatment and patients suffering from other similar pulmonary diseases (lung cancer, LC, and chronic obstructive pulmonary diseases, COPD) were included along with fresh TB cases and healthy PPD ve subjects. Preprocessing of the Data Files. Two different preprocessing methods were employed for baseline correction, noise reduction, smoothing, replacement of missing values, and area calculation in the two discovery stages.911 In the discovery stage-I, raw chromatograms (.d) from ChemStation (Agilent Technologies, U.S.A.) were converted to NetCDF (.cdf) file, and with the use of Shimadzu software (GCMS Solution, Shimadzu, Japan) peak detection and molecule identification were carried out. In the stage-II, GC/MS data files obtained from ChemStation were processed for baseline correction, noise reduction, and peak picking using AMDIS.12 Deconvolution of individual data files was carried out using Spectconnect from the .elu files obtained from AMDIS.13 The parameters of the Spectconnect elution threshold of 1 min, support threshold of g75% of samples, and similarity threshold of 80% were selected for group comparison. As the number of uploading sample files is restricted to 10 per group we have run Spectconnect batchwise for the entire data set and then combined all the group-specific molecular information. After receiving the deconvoluted molecular matrix from Spectconnect, we identified each peak from the 5528

dx.doi.org/10.1021/ac200265g |Anal. Chem. 2011, 83, 5526–5534

Analytical Chemistry

ARTICLE

Figure 2. (A) Total ion chromatogram showing a comparative metabolic profile of healthy controls [(H (PPD þve/ve)] and TB patients (P) used in the discovery stage-I. (B) OPLS-DA score plots (A = 2, N = 58, R2X = 0.462, R2Y = 0.674, and Q2 = 0.620) obtained from the comparative urine VOCs analysis of TB and healthy controls. (C) Validation model scores using 999 random permutation tests not outperforming the original PLS-DA model.

individual data set using an off-line library search. A reference standard library (NIST MS 2.0) comprising 209 311 spectra was used to aid the identification of the GC-separated molecules. The data table was made ready using all the identified molecules as X- variables and their class belongingness as the Y-variable for further analysis. Data Analysis. In both discovery and validation stages, missing values in the data table were replaced with half a minimum value found in the data set and total area normalization was performed.10 The total area normalization for each sample was performed by dividing the integrated area of each analyte by the sum of total peak areas of analytes present in the sample, and this data table was exported to SIMCA-P 12.0.1 (Umetrics, U.S.A.) for multivariate statistical analysis. An unpaired t test was employed to the selected molecules to find out their significance for disease discrimination. Multivariate Statistics. Chemometric data analysis was carried out using the extracted molecular information. Principal component analysis (PCA) was performed to verify the grouping trends and outliers in the data. Outliers were eventually excluded11 and were then visualized by scores and loading plots. The data were subjected to partial least-squares and discriminate analysis (PLS-DA) and orthogonal partial least-squares and discriminate analysis (OPLS-DA), where a model was built and used to identify the putative marker metabolites with higher discriminatory power.14 Model validation was performed by permutation tests with 999 iterations. These permutation tests compared the goodness of fit of several models based on the randomly selected permutation of the subsets of data of the Y-observations, while keeping the X-matrix intact.15,16 A blinded set of samples (n = 38; age range, 1739 years; male/female, 32/6) were used in the second PLS-DA model to calculate the disease predictability. For each variable the trade-off between the

sensitivity and specificity was summarized using the area under the receiver operating characteristic curve (AUC) and calculated using the trapezoidal rule.17 Effect of Natural Variation (Age and Gender) on Marker Molecules. The average age of TB patient samples used in the test set was 27 years (age range: 1739, n = 29). On this basis, the patient samples were grouped to two groups (1727 years, n = 7; 2839 years, n = 12). The patient test data set was also grouped based on their gender (male/female: 23/6). The abundance of individual marker molecules was compared between these groups, and p-value were calculated at the 95% confidence. Interday variations of marker molecules within the patient groups were calculated using three patient samples run on three consecutive days. Peak areas of the marker molecules were taken for calculating relative standard deviation (% RSD).

’ RESULTS Using GC/MS coupled to a headspace sampler, urine VOCs were analyzed. Because of the diverse chemical nature of the analytes present in urine which might influence the urinary VOCs profile, storage and sample preparation conditions were optimized prior to GC/MS data acquisition. It was reported that alteration of urine pH could influence the number of identifiable molecular constituents.18 We found that acidification of the urine samples increased the number of identifiable metabolites by ∼37% (Supporting Information Figure S-1) and addition of 25 μL of HCl/mL of urine sample (pH = 3.0) yielded an optimum molecular information (Supporting Information Figure S-2). The total ion chromatogram (TIC) of urine sample did not show any significant differences with or without the addition of protease inhibitors in samples stored at 80 °C (Supporting Information Figure S-3). Urinary VOCs profiles were not altered 5529

dx.doi.org/10.1021/ac200265g |Anal. Chem. 2011, 83, 5526–5534

Analytical Chemistry

ARTICLE

Figure 3. (A) OPLS-DA score plots (A = 2, N = 75, R2X = 0.567, R2Y = 0.923, and Q2 = 0.870) obtained from the comparative urine VOCs analysis of TB and healthy controls used in the discovery stage-II. (B) Validation plot obtained from 999 random permutation tests showing the robustness of the original PLS-DA model. (C) Prediction of classification of blinded test subjects (T; n = 38) using the PLS-DA model. (D) Receiver operating characteristic curve (ROC) calculated using the validated Y-predicted values obtained from the blinded test set. Diagnostic accuracy is calculated by the area under curve (AUC). The AUC value for our blinded test set was 0.988.

up to 1 month when stored at 80 °C (Supporting Information Figure S-4). The optimized storage conditions and preprocessing of urine samples were adopted for subsequent studies. On visual inspection, TICs of a healthy urine sample run on three different days (first, third, and fifth day) showed a high degree of similarity depicting reproducibility of the adopted static head space sampling GC/MS method (Supporting Information Figure S-5 ). Five healthy subject urine sample run on the same day confirmed interindividual variability (Supporting Information Figure S-6A). A mixture of standard solvents run consecutively for five times showed similarity in TICs (RSD < 9%) and validates accuracy of the adopted method (Supporting Information Figure S-6B). To analyze urine VOCs we followed two different discovery stages. In stage-I, urine samples from three groups, healthy PPD þve controls (n = 19), healthy PPD ve controls (n = 11), and new cases of TB patients (n = 30), were analyzed using GC/MS with a modified splitsplitless (SSL) injector. Visible differences in the TICs were clearly observed between TB patients and both groups of control subjects (Figure 2A). On an average, 120 peaks were identified in each of the acidified protease inhibitor added samples. A total of 18 peaks were found to be present in most of the samples used in stage-I. Out of the 18 identified molecules two which were from column bleed were removed from the data set before undertaking chemometric analysis. Comparison of the identified metabolites between the healthy controls and patient groups revealed complete class distinction (R2 = 0.772, Q2 = 0.347, and principal component (PC) = 4) in PCA. The PLS-DA model contained four latent variables (LV), showing performance statistics of R2X = 0.728, R2Y = 0.706, and Q2 = 0.624. Application of PLS-DA and OPLS-DA resulted in clear

distinction between TB patients and both groups of healthy controls (PPD þve and PPD ve). The OPLS-DA model showed performance characteristics of A = 2, N = 58, R2X = 0.462, R2Y = 0.674, and Q2 = 0.620 (Figure 2B). The model parameters for the explained variation “R2” and the predictive capability “Q2” were significantly high indicating an excellent model. Five marker molecules selected based on their VIP (variable importance plot) score greater than 1.0 showed high discriminatory power for TB diagnosis. Isopropyl acetate and o-xylene showed significant increase in abundance (2.07- and 6.37-fold, respectively, p < 0.05) in the urine of TB subjects. Molecules like cymol, 2,6dimethystyrene, and 3-pentanol showed significant decrease in abundance (0.42, 0.37, 0.59, respectively, p < 0.05) in urine of TB patients. Moreover, validation plot (Figure 2C) indicated that the model was suitable and not due to chance correlation. Even though the PPD þve and PPD ve healthy controls represent a single class of noninfected subjects, a small class separation was observed with some overlapping points. Overall, the two healthy control groups cluster distinctly from TB patients. To assess whether the potential biomarkers generated from the adopted strategy could lead to differentiate TB patients from non-TB controls, we compared the molecular profiles obtained from 113 new subjects out of which 17 non-TB subjects and 58 TB patients were used as a second data set for marker discovery and 38 subjects as the testing set. In stage-II, a total of 12 molecules were identified in most subjects. PCA revealed that two patient subjects and one healthy subject were severe outliers and were excluded from further chemometric analysis (R2 = 0.513, Q2 = 0.176, and PC = 2). Subsequent supervised multivariate statistical analysis using PLS-DA demonstrated significant 5530

dx.doi.org/10.1021/ac200265g |Anal. Chem. 2011, 83, 5526–5534

Analytical Chemistry

ARTICLE

Figure 4. (A) Urine metabolite profile (VOCs) of TB patients alters with treatment. OPLS-DA statistical analysis (N = 65, R2X = 0.637, R2Y = 0.679, Q2 = 0.508, LV = 3) comparing healthy control subjects (H (PPD ve); n = 15) with TB patients (P; n = 30) and patients undergoing early (3M; 13 months; n = 15) and late (7M; 47 months; n = 5) treatment periods. (B) OPLS-DA model comparing lung cancer (LC; n = 7) with performance characteristic of R2X = 0.716, R2Y = 0.966, Q2 = 0.92, and LV = 3. (C) OPLS-DA model comparing chronic obstructive pulmonary disease (COPD; n = 5) with performance characteristic of R2X = 0.609, R2Y = 0.967, Q2 = 0.94, and LV = 2.

discrimination between all the groups and revealed groupspecific metabolic profiles. The PLS-DA model obtained from the discovery stage-II contained two latent variables, showing performance statistics of R2X = 0.483, R2Y = 0.915, Q2 = 0.904, and LV = 2. The OPLS-DA showed performance statistics of R2X = 0.567, R2Y = 0.923, Q2 = 0.870, and LV = 3 (Figure 3A). Metabolites with VIP value more than 1.0 were taken as significant and identified as marker metabolites. The marker molecules and calculated fold change in TB patients were found to be similar to the findings of stage-I. Isopropyl acetate and o-xylene showed significant increase (>2-fold, p < 0.05), and cymol, 2,6-dimethystyrene, and 3-pentanol showed significant decrease in abundance in TB urine samples (