Mapping the Lung Proteome in Cystic Fibrosis - ACS Publications

Apr 8, 2009 - The pathophysiology of cystic fibrosis (CF) lung disease remains incompletely ... Differential protein expression between CF and control...
0 downloads 0 Views 4MB Size
Mapping the Lung Proteome in Cystic Fibrosis Sina A. Gharib,*,†,‡ Tomas Vaisar,‡ Moira L. Aitken,‡ David R. Park,‡ Jay W. Heinecke,‡ and Xiaoyun Fu*,‡,§ Center for Lung Biology and, Department of Medicine, University of Washington, Seattle, Washington 98195, and Puget Sound Blood Center, Seattle, Washington 98104 Received February 5, 2009

The pathophysiology of cystic fibrosis (CF) lung disease remains incompletely understood. Novel mechanisms in the pathogenesis of CF lung disease may be discovered by studying the patterns of protein expression in bronchoalveolar lavage fluid (BALF). We used shotgun proteomics to analyze BALF samples from 8 CF and 4 control subjects. Differential protein expression between CF and control subjects was determined using spectral counting and statistical analysis. Using Gene Ontology analysis, we identified enriched biological modules and then applied network analysis to construct a protein interaction map in CF lung disease. Shotgun proteomics analysis of BALF identified hundreds of proteins whose differential enrichment or depletion robustly distinguished the CF phenotype from normal controls. Functional categorization and network analysis identified key processes, including the immune response and proteolytic activity that are known contributors to CF lung disease. Importantly, this approach also implicated abnormalities in previously unsuspected pathways, such as dysregulation of the complement system that may have critical roles in the pathogenesis of CF lung disease. By integrating shotgun proteomics with statistical and computational analyses, we have developed a promising approach to understand the pathophysiology of CF lung disease. Our approach should be applicable to a wide range of proteomics-based clinical research. Keywords: airway inflammation • protein network • mass spectrometry • spectral index • bronchoalveolar lavage fluid • proteomics

Introduction Unraveling the pathogenesis of cystic fibrosis (CF), the most common lethal genetic recessive disease in individuals of European ancestry, poses a formidable challenge. CF afflicts approximately 30 000 individuals in the United States, reducing their median life expectancy to 37 years.1,2 Although it is a multiorgan disorder, the primary cause of death is unrelenting pulmonary infections, airway remodeling and fibrosis leading to respiratory failure. The mechanisms by which these processes interact within the host remain incompletely understood. In the past two decades, significant advances have been made in our understanding of CF at the molecular level. Mutations in a single gene, the cystic fibrosis transmembrane conductance regulator (CFTR), cause the disease, and more than 1400 different mutations have been identified in this gene. Although CF is linked to a single gene, a wide spectrum of phenotypic severity has been observed, and deciphering the consequences of mutations in this gene has proven challenging.3 CFTR functions as a chloride channel in epithelial cells, * Corresponding Authors: Sina A. Gharib, Center for Lung Biology, 815 Mercer St., Seattle, WA 98109. E-mail: [email protected]. Tel: 206221-0630. Xiaoyun Fu, Puget Sound Blood Center, 921 Terry Avenue, Seattle, WA 98104-1256. E-mail: [email protected]. Tel: 206-398-5916. Fax: 206587-6056. † Center for Lung Biology. ‡ Department of Medicine. § Puget Sound Blood Center.

3020 Journal of Proteome Research 2009, 8, 3020–3028 Published on Web 04/08/2009

but it may also regulate, directly or indirectly, inflammatory responses, cell signaling pathways, and other ion transportation.3 Mutations in CFTR cause abnormal chloride transport in the respiratory epithelium and change the rheology and clearance of airway secretions, allowing pathogenic bacteria to colonize the airways.3 Chronic infection, a prominent neutrophilic response, and the inflammatory consequences progressively remodel and destroy airways, leading to bronchiectasis and, ultimately, respiratory decompensation. Linking and integrating these diverse biological processes can lead to global insights into the pathophysiology of CF. Proteomics-based approaches, which have the potential to generate large, integrated sets of data, offer great promise for systematically investigating the pathogenesis of CF.4 In contrast to transcriptomics, proteomics has the advantage of assessing the presence of gene products that are likely to be functionally relevant in this disease. For example, Sloane et al. and von Bredow et al. used 2-dimensional (2-D) gel electrophoresis to analyze sputum and bronchoalveolar lavage fluid (BALF) of CF and control subjects and detected differential presence of several proteins, such as surfactant protein A (SFTPA) and myeloperoxidase (MPO), between the two groups.5,6 Recently, Roxo-Rosa et al. employed a 2-D gel proteomics approach to search for novel biomarkers of CF lung disease by identifying proteins from nasal cells from CF patients and control individuals.7 Despite excellent resolving power, 2-D gel proteomics has several limitations. Most importantly, it is biased toward 10.1021/pr900093j CCC: $40.75

 2009 American Chemical Society

research articles

Mapping the Lung Proteome in Cystic Fibrosis Table 1. Characteristics of Control and CF Subjects and Their BALF Samples subject

agea

sex

CF-1 CF-2 CF-3 CF-4 CF-5 CF-6 CF-7 CF-8 Cont-1 Cont-2 Cont-3 Cont-4

31 28 24 18 19 20 38 35

M F M M M M F M M M F F

CF genotypeb

∆F ∆F ∆F ∆F ∆F ∆F ∆F ∆F

508/711 + 1GfT 508/∆F 508 508/∆F 508 508/∆F 508 508/∆F 508 508/∆F 508 508/∆F 508 508/1717-1GfA

FEV1 (% predicted)

microbiologyc

BALF protein (mg/mL)

BALF PMNd (%)

111% 66% 54% 81% 94% 88% 84% 69%

PA, SA, BM, Asp PA, SA PA, SM, Asp PA SA, Asp SA, HI SA SA, SM, Asp

0.312 0.589 0.195 0.738 0.316 0.204 0.122 0.928 0.090 0.053 0.056 0.039

92 95 86 95 92 89 91 92 1 0 0 4

a Control age: 18-40. b ∆F 508: deletion of phenylalanine at position 508 in the CFTR protein; 711 + 1GfT: a G to T nucleotide change (splice site mutation, intron 5); 1717-1GfA: a G to A nucleotide change (splice site mutation, intron 10). c PA, Pseudomonas aeruginosa; SA, Staphylococcus aureus; BM, Burkholderia multivorans; SM, Stenotrophomonas maltophilia; HI, Haemophilus influenzae; Asp, Aspergillus species. d PMN: Polymorphonuclear leukocyte.

the detection of high-abundance proteins and therefore does not provide a comprehensive proteomic profile. Furthermore, difficulties in reproducibility and limitations in protein identification, may limit its utility for large-scale proteomics mapping. Shotgun proteomics, which separates tryptic digests of complex protein mixtures with multidimensional liquid chromatography and then analyzes the resulting peptides by tandem mass spectrometry and database searching, overcomes many of these deficiencies.8 Although shotgun proteomics is also biased in favor of detecting high-abundance proteins, it has a much wider dynamic range to capture low-abundant proteins compared to gel-based methods, making this approach well suited for large-scale protein measurements in biological samples. For the meaningful application of shotgun proteomics to clinical questions, however, concomitant statistical and computational methods must be developed. We recently performed shotgun proteomic analysis of CF and normal BALF samples to develop a robust statistical metric for determination of differential protein expression.9 In the present work, we integrate computational and bioinformatics methods with label-free protein quantification to assess whether differential protein expression in the BALF can provide mechanistic insights into the pathogenesis of CF lung disease.

Materials and Methods BALF Protocol. All BALF collection protocols were approved by University of Washington’s institutional review board, and informed consent was obtained from all subjects. BALF was obtained from 8 adult CF subjects without clinical evidence of acute exacerbation and from 4 adult control volunteers who were free of lung disease using a fiberoptic bronchoscope as previously described.9,10 Baseline characteristics of the subjects and their BALF are shown in Table 1. The CF subjects were lavaged by instilling two 30-ml aliquots of sterile 0.9% saline, recovery of fluid by wall suction and centrifugation at 200× g for 30 min. The supernatant was divided into multiple aliquots and frozen at -70 °C. BALF from 4 normal volunteers was obtained and handled using a similar protocol except that five 30-mL aliquots per individual were obtained. All subsequent analyses were corrected for the volume difference in CF and control BALF. Protein concentrations were measured by the Bradford Protein assay (Bio-Rad). Subject and BALF sample characteristics are shown in Table 1.

Proteomics Analysis. Shotgun proteomics analysis was performed on the 12 CF and normal BALF samples as we have described before.9 Briefly, BALF samples were reduced with 4 mM DTT and alkylated with 10 mM iodoacetamide. The proteins in BAL fluids were digested overnight at 37 °C with trypsin (Promega) in 50 mM ammonium bicarbonate and 10% acetonitrile buffer. Tryptic peptides were concentrated and desalted with C18 Cartridges (Empore, 3M). Tryptic digests of BALF samples were analyzed using two-dimensional liquid chromatography-electrospray ionization-tandem mass spectrometry (2D-µLC-ESI-MS/MS). The peptides were separated by a strong cation exchange column (Biobasic SCX, 320 µm × 10 cm, Thermo Fisher) and eluted using an 11-step gradient (0, 10, 20, 30, 40, 50, 60, 70, 100, 200, and 800 mM ammonium chloride). Next, the samples underwent reverse-phase capillary column (Biobasic-18,180 µm × 10 cm, Thermo Fisher) at a flow rate of 2 µL/min using solvent A (0.1% formic acid) and solvent B (0.1% formic acid in acetonitrile) with a 90-min linear gradient from 7 to 35% solvent B. MS/MS was performed using a Finnigan LCQ Deca ProteomeX ion trap mass spectrometer (Thermo Electron Corp.). A survey scan from m/z 300 to 1500 was initially performed, followed by data-dependent MS/MS analysis of the three most abundant ions. Dynamic exclusion was set to repeat the same precursor ion twice within a 30 s window and followed by excluding it for 1.2 min. Each sample was analyzed twice. Dynamic exclusion was set to repeat the same precursor ion twice within a 30 s window and followed by excluding it for 1.2 min. Each sample was analyzed twice. MS/MS spectra were searched against the Human International Protein Index database (v3.01) using the SEQUEST search engine.11 Search parameters allowed modified thiol residues (fixed carbamide methylation and variable methionine sulfoxide) and one incomplete cleavage site. The results were further processed using PeptideProphet and ProteinProphet with an adjusted probability of g0.9 for peptides and g0.96 for proteins.12 All protein identifications required detection of g3 unique peptides in at least 1 sample. ELISA and Immunoblot Analysis (see Supporting Information). Myeloperoxidase (MPO) was assayed using a sandwich ELISA kit (Oxis International) according to the manufacturer’s instructions but with the following change: samples were diluted in PBS (pH 7.4) supplemented with 0.5% BSA and 100 mM MgCl2. Intercellular adhesion molecule 1(ICAM1), matrix metalloproteinase 8 (MMP-8) and matrix metalloproteinase 9 Journal of Proteome Research • Vol. 8, No. 6, 2009 3021

research articles (MMP-9) were analyzed using a sandwich ELISA (R&D Systems) according to the manufacturer’s instructions. Western blot analysis of Complement 2 (C2) and surfactant protein A1 (SFTPA1) were performed with the appropriate antibodies (goat antisera to human C, Quidel and antihuman SFTPA, Chemicon) according to the manufacturer’s instructions. Cluster Analysis. Hierarchical clustering of the proteomics profiles (459 proteins) of all 12 subjects (8 CF, 4 control) was performed using the average linkage method and the Euclidian distance metric.13 Determination of Differential Protein Expression. We combined several strategies to identify significant differences in protein expression between BALF of CF and control individuals. First, we limited our analysis to proteins identified by a high confidence score and having at least 3 unique peptides, thereby markedly reducing the false-positive rate of identification.12,14 Second, to quantify relative protein abundance between CF and control subjects we applied spectral counting together with an empiric test we have recently developed and validated termed the spectral index (SI).9 SI normalizes relative protein abundance between two groups of sample to values between -1 and +1. For a given protein, a SI close to +1 implies enrichment in one group (i.e., CF subjects) whereas a value close to -1 signifies enrichment in the other group (i.e., control individuals). A value close to 0 indicates that a protein is about equally abundant in CF subjects and controls. We used the SI and a permutation-based analysis to statistically determine the relative abundance of proteins in BALF of CF versus control subjects at a 99% confidence level. Gene Ontology (GO) Analysis. Functional annotation of differentially expressed proteins in BALF of CF patients relative to normal subjects was obtained from the GO database.15 Overrepresented functional categories among the differentially expressed proteins relative to all identified proteins in human BALF (CF and control) were determined using the Expression Analysis Systematic Explorer (EASE).16 Enriched functional categories were required to be significant at a false discovery rate