Protein Expression in Sputum of Smokers and Chronic Obstructive

The sputum samples from 56 smokers with lung disease whose severity ranged from normal (healthy smokers) to chronic obstructive pulmonary disease with...
0 downloads 0 Views 1MB Size
Protein Expression in Sputum of Smokers and Chronic Obstructive Pulmonary Disease Patients: A Pilot Study by CapLC-ESI-Q-TOF Begoña Casado,*,†,‡ Paolo Iadarola,*,† Lewis K. Pannell,§ Maurizio Luisetti,| Angelo Corsico,| Elena Ansaldo,| Ilaria Ferrarotti,| Piera Boschetto,⊥ and James N. Baraniuk‡ Dipartimento di Biochimica “A. Castellani”, Universita’ di Pavia, Italy, Division of Rheumatology, Immunology and Allergy, Georgetown University Proteomics Laboratory, Washington, DC 20057, Cancer Research Institute, University of South Alabama, Mobile, AL 36688, Laboratorio di Biochimica e Genetica, Clinica di Malattie dell’Apparato Respiratorio, Fondazione IRCCS Policlinico San Matteo, Universita’di Pavia, Italy, and Dipartimento di Medicina Clinica e Sperimentale, Sezione di Igiene e Medicina del Lavoro, Università di Ferrara, Italy Received July 16, 2007

The current report describes the use of CapLC-ESI-Q/TOF-MS for investigating the proteome profiles of hypertonic saline-induced sputum samples from 56 smokers. The severity of their lung disease ranged from normal (healthy smokers) to chronic bronchitis, chronic obstructive pulmonary disease (COPD), and COPD with emphysema. This pilot study examined the hypothesis that there were distinct differences in protein expression profiles that were related to the phenotype and cigarette smoking illness severity. A total of 203 unique proteins were identified. These may represent the most highly expressed proteins in induced sputum. Our results provide evidence that different proteins are expressed, as the disease progresses from health to more advanced stages, and support our contention that a proteomic approach would be beneficial in discovering selective molecules linked to specific COPD stages. Keywords: COPD • chronic bronchitis • smokers • sputum • LC-Q-TOF

Introduction The term chronic obstructive pulmonary disease (COPD) defines a multifactorial syndrome which includes chronic bronchitis and emphysema.1–5 COPD is characterized by a chronic lung inflammation with progressive deterioration of pulmonary function and airflow limitation.1,4,6 Current treatments provide only palliative limited relief. These factors combine to make COPD a major cause of disability and death among older patients.5 According to the Global Obstructive Lung Disease (GOLD) initiative,7 the syndrome is clinically defined by the presence (COPD subjects) of airflow obstruction (forced expiratory volume in 1 s, FEV1, 20% or when the sputum sample was adequate. Sample Preparation and CapLC-ESI-Q-TOF. Sputum was treated with DTT (final concentration 10 mM), vortexed for 5 min, and centrifuged at 3000 rpm for 3 min to remove cellular and aggregated material. The supernatant was collected, and total protein concentration was measured by the Lowry method.33 All supernatants were lyophilized and stored at -80 °C until analysis. Lipids were extracted with chloroform/ methanol 1:1.34 The layer of proteins at the interface between the lipids and aqueous waste was extracted and then resuspended in RapiGest SF 0.2%, reduced with DTT (final concentration 55 mM) and alkylated with iodoacetamide (final concentration 15 mM). Proteins were digested with trypsin (enzyme/ protein ratio 1:20) for 1 h at 37 °C. Each tryptic peptide mixture (5 µL) was concentrated and desalted on a Biobasic C18 (5 µm, 30 × 0.32 mm) (Thermo Electron Corporation, PA, USA) precolumn. Separation was performed with a CapLC (Waters, Milford, MA, USA), on an RP-Zorbax C18 WSB (100 mm × 0.15 mm ID) (Micro-Tech Scientific, CA, USA) column using a 40 min gradient from 95% H2O + 0.2% formic acid (FA, solvent A) to 40% acetonitrile + 0.2% FA (solvent B), with a flow rate of 1 µL/min. Eluted peptides were analyzed by ESI-Q-TOF 1 (Waters, MA, USA), and MASSLYNX version 4.0 was used to control CapLC and to acquire and process the data. All samples were run in duplicate. The elution times of pairs of endogenous peptides from the most abundant proteins in sputum (two of albumin, two of lysozyme, and two of Ig A1) were determined to check the reproducibility of the experiments. Mean, standard deviation, and coefficient of variance (CV%) were calculated for each of the six peptides. These data are shown in Table 1.

research articles

Protein Expression in Sputum of Smokers and COPD Patients Table 2. Group Demographics (Mean ( SD) subjects

NS

HS

CB

COPD

COPD & E

n) % males age (year) pack–years sputum total protein (µg/mL)a FEV1/FVC % predicted

7 71 68 ( 2 0 717 ( 172 79.53 ( 5.97

13 85 42 ( 6 25.0 ( 8.6 623 ( 95 81.38 ( 5.03

11 73 42 ( 5 36.3 ( 15.0 660 ( 97 75.64 ( 5.77

15 73 70 ( 9 33.4 ( 25.8 747 ( 151 49.47 ( 14.79b

10 82 69 ( 8 47.7 ( 35.7 801 ( 198 37.15 ( 6.30b

a

p ) 0.033 by ANOVA.

b

p ) 0.0002 by two-tailed, unpaired Student’s t test.

Bioinformatics. “MASCOT MS/MS ion search software” (http://www.matrixscience.com) matched peptides to proteins using the National Center for Biotechnology Information (NCBInr) nonredundant protein database and match criteria previously described.35 For proteins that were identified by single peptides with MASCOT probability (Mowse) scores >50, the peptide sequences were confirmed by manual sequencing. These sequences and others that were matched to hypothetical or unknown proteins by MASCOT were screened using the Protein Identification Resource (PIR) (http://pir.georgetown. edu) “Peptide Match” program and “Non Redundant Reference Protein” (NREF) and “Integrated Protein Classification” (iProClass) databases of human proteins. Peptide sequence alignments were constructed using the PIR “CLUSTAW” program. Proteins identified by several NCBInr accession codes (gi|) were organized according to Gene ID numbers. The frequency of detection of each protein was determined within each clinical group. Proteins were characterized for gene ontology using the GO database (http://www.geneontology. org/). Molecular weights and isoelectric points were calculated using the “ExPasy Compute pI/MW” program (http://us.expasy.org/tools/pI-tool.html) to create virtual 2-D maps of each group’s sputum proteins. Statistical Analysis. The numbers of distinct peptide sequences coding for individual proteins were determined for each MS run. Reproducibility of peptide detection and protein identification between replicate runs was assessed by comparing the number of peptides per protein detected in run 1 vs run 2 using two-tailed, paired Student’s t tests of all runs in each group (p e 0.05 considered significant). The percent coverage of each protein was also compared between runs 1 and 2. Comparisons between groups were carried out to determine similarities and significant differences in the frequencies of detection for each protein.

Results Protein Identification. In this paper, we report the first detailed characterization of the protein content of sputum samples obtained from 56 subjects grouped, according to the severity of their pulmonary disease, into five clinical groups. Details concerning sex, age, number of cigarettes smoked per year (pack–years), pulmonary function tests, and protein concentration in induced sputum of these subjects are summarized in Table 2. A total of 203 proteins were identified. These likely represent the most highly expressed proteins in the induced sputum specimens. The protein characteristics for each group were shown in Tables S1 to S5, in the Supporting Information. Data, for each protein, include the NCBInr accession number, molecular weight (kDa), amino acid number, isoelectric point, gene identification number and gene name, chromosome map, and frequency of detection within each group. For most of the

identifications reported, at least three peptide sequences matched the protein using the MASCOT algorithm. For identification based on one or two peptide sequences with a high MASCOT score (>50), the MS/MS spectra were reinterpreted manually to control peak assignment. As shown in Tables S1 to S5, a series of unidentified protein fragments indicated as “unknown” or “hypothetical” proteins were found in the NCBInr database. To gain more information on these proteins, the primary sequences of the fragments were searched using BLAST and the human “PIR” databases. This verification allowed the identification of unknown and hypothetical proteins whose names and characteristics were indicated in Table S6, panels A and B (Supporting Information). Reproducibility and Statistical Analysis. The reproducibility of the chromatographic elution, for each run, was confirmed by comparing the retention time of two peptides each from albumin, immunoglobulin A1, and lysozyme (see Table 1). Their m/z, charge state, mean elution time, SD of elution time, and CV were calculated. To examine the reproducibility of experiments, each sample was run in duplicate, and protein profiles were compared between runs. The values (%) for the (i) sequence coverage of proteins identified based on unique peptides, (ii) number of peptides identified, and (iii) number of unique peptides found for each protein were reported for each of the five groups (Table S7, Supporting Information). Sequence coverage was calculated on the basis of peptides present solely in a specific protein and not shared by other proteins. The significance levels for each of these variables were evaluated using the two-tailed paired Student’s t test to compare runs within each group. For sequence coverage, the only proteins, out of the total of 203, that showed significant differences were Rei and Bence-Jones proteins in the NS group and albumin in the HS group (p e 0.05). The rate for significant differences in sequence coverage between runs was 0.0003 (3 proteins/[5 groups × 203 total proteins]). On the basis of the 112 runs performed, differences in sequence coverage were predicted to be 1.5 per 100 proteins analyzed (p ) 0.015). More proteins had significant (p e 0.05) differences in the number of peptides detected per protein between the two runs. These included zinc-R 2-glycoprotein in the NS group; the secretory leukocyte protein inhibitor (SLPI) in HS; albumin in CB; Ig A1Bur, Ig A2 chain C, Ig K light chain, MUC5AC, and Mucin 5 in COPD; and Ig G1, Ig K light chain, and polymeric immunoglobulin receptor (PIGR) in COPD & E. The rate for significant differences in peptides per protein was 0.0011, or a difference between runs of only 8 peptides for every 100 proteins identified. Unique peptide sequences that were detected in one run but not in another were significantly different (p e 0.05) for Clara cells of 10 kDa (uteroglobulin) and albumin in HS and Ig K light chain and neutrophil peptide HP1 in COPD. The rate for significant differences was 0.0004, Journal of Proteome Research • Vol. 6, No. 12, 2007 4617

research articles

Casado et al.

Figure 1. Predicted two-dimensional maps based on the pI and molecular weights calculated for each protein. Each panel contains the map obtained from a single group of subjects investigated.

or a difference between runs of about 2 unique peptides per 100 proteins identified. Theoretical Maps. The CapLC-based separation system confirmed the expression of many of the sputum proteins previously described from 2-DE methods in BALF. It also identified many proteins (underlined in Table S7) for the first time. In particular, proteins with extreme isoelectric points and high mass were identified that are challenging to visualize using the traditional 2-DE format. The theoretical 2-D maps shown in Figure 1 demonstrate that CapLC identified 7 proteins with pI above pH 9 in NS; 13 in the group of HS; 11 in subjects with CB; 15 in the group of COPD without E; and 9 in subjects with COPD & E (panels A–E, respectively). Twenty-one proteins with molecular masses above 150 kDa were detected: n ) 1 in NS; n ) 8 in HS; n ) 6 in CB; n ) 1 in COPD; and n ) 5 in COPD & E. These included proteins secreted by mucus glands, such as mucins MUC5, 261 kDa (gi/17384254), and MUC5B, 590 kDa (gi/23821885), or of cellular origin such as DNAactivated protein kinase, 465 kDa (gi/1362789), and titin, 3816 kDa (gi/17066105). Eighteen proteins had masses below 10 kDa (2 in NS; 6 in HS; 5 in CB; 2 in COPD; and 3 in COPD & E). Of these, the Ig binding factor, 3.3 kDa (gi/237563); neutrophil 4618

Journal of Proteome Research • Vol. 6, No. 12, 2007

HNP-3, 3.5 kDa (gi/229859); and HNP-1, 3.4 kDa (gi/228797), were the smallest proteins. Protein Number for Each Group. In an attempt to define whether specific changes of proteomic profiles could be associated with alterations of the disease severity, the global protein patterns were compared between groups. NS had the lowest number of induced sputum proteins based on NCBInr entries. However, the total protein level was not lower than that of other groups (Table 2), suggesting that these proteins were present in relatively high concentrations. In contrast, the HS and CB groups had slightly lower total protein concentrations but the highest numbers of proteins. This suggested that more proteins were detectable and that they may have had relatively lower concentrations than in NS. It had been anticipated that more proteins would be identified in the COPD and COPD & E groups, but instead, their numbers were intermediate between NS and the HS and CB sets. Figure 2 shows Venn diagrams that compared the relative numbers of proteins per group: (a) CB > HS > NS (panel A); (b) COPD > COPD & E > NS (panel B); (c) COPD > COPD & E > HS (panel C); and (d) CB > COPD (panel D).

Protein Expression in Sputum of Smokers and COPD Patients

research articles

Figure 2. Venn diagrams showing the number of unique proteins per group and the number of the protein overlaps between the different groups.

Differential Protein Expression Among Groups. All NS sputum samples contained peptides from deleted-in-malignant brain tumor 1 (DMBT1), lysozyme, proline rich 4 (lacrimal), and Clara cell 10 kDa protein. These proteins were secreted from epithelial, duct, and submucosal gland cells. Submucosal gland serous cells were the source for lactoferrin (LTF), polymeric immunoglobulin receptor (PIGR), secretory leukocyte protease inhibitor (SPLI), and long palate lung nasal clone 1 (LPLUNC 1). Mucin 5B was expressed by mucus cells of submucosal glands. Albumin was a marker of vascular permeability. Mucosal plasma cells synthesize IgA as dimers that are linked by the joining chain (IGJ). All NS samples contained IgA1, IgA2, and IgJ. The proteomic analysis presented here was able to identify peptides corresponding to the variable and constant regions of heavy and light chains. Healthy smokers have been reported to have more immunoglobulin proteins than the nonsmokers.22 A large number of light and heavy chain variable region sequences were detected suggesting humoral immune stimulation and expansion of local immunoglobulin production. Glandular proteins such as serous cell lysozyme, lactoferrin, SLPI, LPLUNC1, mucus cell mucin 5B, and DMBT1 had the highest frequencies of detection in the healthy smokers group. However, other sputum constituents of healthy smokers differed from nonsmokers. Defensin A1 (DEFA1) was an indicator of neutrophilic inflammation. The cytoskeletal protein Actin β suggested cellular destruction. Neutrophil polypeptides were detected more frequently in COPD with emphysema than in the other groups. Proteins included defensin A1, lipocalin 2 or NGAL (LCN2), myeloperoxidase, neutrophil elastase 2, calgranulin A (S100A8), and calgranulin B (S100A9). Eosinophil proteins were also detected in at least two subjects. The high frequency of IgA1, IgA2, and IgJ suggested mucosal plasma cell production of secretory IgA. Plasma extravasation

was implicated in COPD & E pathogenesis since albumin was detected in all COPD & E samples. IgG1 and IgG4 were also detected frequently, suggesting that vascular permeability was also important for antibody delivery in COPD with emphysema. Epithelial mucin 5AC and Clara cell protein (SCGB1A1) were relatively reduced in COPD & E compared to NS. Ontology. Proteins identified in each group were characterized by their biological processes, molecular function, and cellular localization by assessing GeneOntology (Figure 3, top to bottom, respectively). Patterns were reasonably similar with the following exceptions. The sputum of COPD and COPD & E subjects had fewer signaling pathways and structural constituents compared to NS, HS, and CB groups. Extracellular proteins and proteins of cytoskeleton were lower in COPD than in each of the other groups. As expected, the number of proteins involved in innate immune host defense responses was lower in NS than in the other groups. By contrast, the COPD & E group had the highest number of kinase activities, while this family of signaling proteins was rarely detected in any of the other four groups.

Discussion General Considerations on Experimental Procedures. Given the lack of effective biomarkers of chronic obstructive pulmonary disease and the little proteomic information available in the literature, we expected that our study would be a reasonable starting point to provide key information in identifying protein candidates involved in lung pathogenesis. Saline-induced sputum was the biological matrix used for our investigation since it can be easily collected by noninvasive methods, and it has been shown to be a valuable tool for sampling the contents of the lower airway. The sputum specimens were obtained from individuals grouped by airway disease severity with the purpose of identifying patterns of protein expression that may have been relevant to the clinical status of each subject group. The LCJournal of Proteome Research • Vol. 6, No. 12, 2007 4619

research articles

Casado et al.

Figure 3. Ontology of proteins identified in the five groups of subjects investigated, based on molecular function, biological process, and cellular compartment, top to bottom, respectively.

ESI/Q-TOF approach identified 118 proteins (underlined in Table S7) that had never been described before in proteomics evaluations using sputum or BALF specimens with 2-DE as the separation technique. Of particular interest were the many immunoglobulins that have previously been generically identified as “IgG” and “IgA” without regard to their precise constant, variable heavy and light chain compositions. Chromatographic 4620

Journal of Proteome Research • Vol. 6, No. 12, 2007

and mass spectrometric data were found to have high reproducibility when assessed for run-to-run variations in peptides identified and percent of protein coverage. Individual proteins identified by unique NCBInr accession numbers were the primary outcome of this analysis. Many of these proteins were products of the same gene but represented different variants including splice variants and different immunoglobulin variable

Protein Expression in Sputum of Smokers and COPD Patients regions that have unique NCBinr gil numbers. The latter occur as a result of recombination of heavy chain VDJ and light chain VLJ gene complex regions and their attachment to distinct constant heavy and light chain regions. This analysis is distinct from that of 2-DE where heavy and light chains can be separated as broad bands of different molecular weights but with wide ranges of pI. Our preliminary results provide a more nuanced analysis of variations in immunoglobulin gene expression that may be related to different severities of cigarette smoke induced nonreversible lung disease. Protein Identification. Mucins are the major glycoconjugate components of sputum. The fact that these high molecular weight proteins have never been identified in previous proteomics studies serves as an indicator of the productivity and performance of our experimental approach. MUC5B, from submucosal gland mucus cells, and MUC5A/C, from epithelial goblet cells, belong to the class of secreted mucins. It was not unexpected to observe the increase of mucin detection and increased ratio of MUC5B:MUC5A/C that we demonstrated in COPD. It is likely that this hypersecretion contributes to airway obstruction. Protease–antiprotease cascades were identified in the sputum samples. Serine-endopeptidases included neutrophil elastase (NE), cathepsin G (CTSG), lactoferrin (LFT), haptoglobin (HP), and cationic antimicrobial protein 37 kDa (CAP 37). Complementary serine-endopeptidase inhibitors included R1-antitrypsin (SERPIN 1, inhibitor of HNE) and secretory leukocyte protein inhibitor (SLPI, inhibitor of Cathepsin G). Complement factor C3 and apolipoprotein J (clusterin) have endopeptidase inhibitor activity. Cystein-protease inhibitors included LCN 1, cystatin (CST)1, CST 4, CST 5, LTF, and lysozyme. NE is stored in azurophil granules and is released by exocytosis into the extracellular fluid. It is a key enzyme in the development of hereditary emphysema, owing to its ability of degrading matrix proteins such as elastin. NE may cause epithelium detachment and mucus hypersecretion.36 Epithelial damage induces the secretion of chimokines that attract more neutrophils at the site of inflammation.37 Neutrophil defensins are also released. Cationic antimicrobial protein, 37 kDa (CAP 37), has neutrophil serine protease, monocyte chemoattractant, and endothelial regulatory activities. CAP 37 increases adhesion of monocytes to human endothelial and rat aorta cells and promotes migration of endothelial cells.38 CAP 37 also has a wide spectrum of antimicrobial action, in particular, against Gram-negative bacteria.39 Clusterin (apoJ) is a multifunctional glycoprotein (70–80 kDa) that interacts with many biomolecules. High levels of this protein during inflammation may protect lung tissues from leukocyte injury.40 Clusterin blocks the terminal part of the complement cascade (C5b-9) and may reduce cellular necrosis and apoptosis. It may also protect cells against oxidative stress. SLPI is the major antiprotease in the upper respiratory tract. It is synthesized and secreted by different types of cells such as epithelial cells and trachea’s glandular cells.41 The cationic nature of the SLPI N-terminal domain determines its antimicrobial activity against both Gram-positive and Gram-negative bacteria. Free oxygen radicals produced by myeloperoxidase (MPO) may determine its oxidative inactivation. The identification of eosinophil cationic protein (ECP), eosinophil peroxidase (EPO), and myeloperoxidase unambiguously demonstrates the participation of eosinophils in airway inflammation. These cells are implicated in the development

research articles

of a prolonged cough in these patients.42 It is well-known that acute exacerbation in COPD causes an irreversible decline of pulmonary function. A possible hypothesis is that this exacerbation is associated with an increase of pulmonary inflammation determined by several causes such as viral or bacterial infections or environmental factors. Viral infections of airway epithelial cells can induce cytokine and chimokine production.43,44 Concentrations of ECP in sputum are higher during acute exacerbation compared to those in patients with stable lung function. This suggests that both neutrophils and eosinophils contribute to acute exacerbations of COPD. Another important protein found in our samples is deleted malignant brain tumor 1 (DMBT 1). An isoform is salivary agglutinin or “gp 340”. This protein belongs to group B, subgroup 3 of the scavenger receptor cysteine-rich (SRCR) protein family. Group B proteins are involved in the regulation of cellular immune response. Subgroup 3 proteins are expressed by epithelial cells in the gastrointestinal tract and exocrine gland duct cells. They are generally secreted in mucus fluids45 and are deputed to the host defense, for example, by binding microorganisms. A role in the epithelial differentiation has also been described.46 DMBT 1 interacts with the defensin collectin surfactant proteins D and A (SP-D and SP-A) and can stimulate alveolar macrophage migration.47,48 Antioxidant proteins included MPO, EPN, EPXO, FDXR, GAPDH, TIN, and ALB. These proteins are generally present in the fluid of pulmonary epithelium where they represent a first line of defense against inhalation of toxic oxidants such as ozone, nitrogen oxide, and cigarette smoke.49,50 Special emphasis should be finally placed on the finding of proteins such as lactiferrin and haptoglobin that bind metal ions that are essential growth factors for microorganisms. Analysis of Differentially Expressed Proteins. As shown in Table 3, Zn-R2-glycoprotein was detected in 71% of NS but in less than 10% of each of the other groups of smokers and COPD subjects (p < 10-5 by ANOVA). This protein is synthesized in a specific subpopulation of epithelial Clara cells of small airways51 and is down regulated in subjects with pulmonary injury. Zn-R2-glycoprotein could be a ligand secreted with the purpose of binding fungi, Gram-positive and Gram-negative bacteria, and lipopolysaccharides. Clara cells (10 kDa), a protein also known as uteroglobin or secretogranin (SCG B1A1), were identified in all NS, in 63% of HS, in 40% of subjects with COPD, but in only 27% of patients with CB or COPD & E (p ) 0.009 with ANOVA). The frequency of detection was significantly higher for NS than for HS (p ) 0.04 by ANOVA) and for subjects of other smoker groups (p < 0.01). These results suggest that cigarette smoke reduces Clara cell expression. Clara cells have been proposed to have cigarette smoke induced anti-inflammatory functions. Loss of these protective effects may play a permissive role in the pathogenesis of COPD. Lacrimal prolin-rich 4 (PRR4), also known as nasopharyngeal carcinoma-associated proline-rich 4 protein, is another protein detected in all NS (p < 10-4 with ANOVA). It was identified in only 18% of HS (p ) 0.0007 by t test compared to NS), and it was not identified in COPD & E (p < 10-6). The frequencies of detection of this protein were high in CB and COPD groups. PRR4 is expressed in lacrimal glands. Its detection as an intact protein was of particular interest since it is generally secreted as a polypeptide truncated at its C-terminus. The resulting peptides are thought to bind bacteria on the mucosal surface and to interfere with the Biofilm formation. In addition, bacterial proteases may cleave other small fragments that could Journal of Proteome Research • Vol. 6, No. 12, 2007 4621

research articles

Casado et al. a

Table 3. Frequencies of Proteins in Each Group of Subjects protein

NS

HS

CB

COPD

COPD & E

ANOVA

subject number Zn-R2-glycoprotein Clara cell PRR4 LPLUNC2 β-actin MUC5A/C MUC5B histone 4 lipocalin 1 SPLUNC1 cathepsin G Ig J Ig µ heavy Ig γ1

7 71% 100% 100% 57% 91% 43% 85% 0% p ) 0.045** 0% 0% 0% 100% 43% 100%

11 9% p ) 0.039* 63% 18% p ) 0.0007 9% 71% 100% p ) 0.014* 100% 0% p ) 0.0046** 27% 0% 0% 100% p ) 0.023*** 9% 55% p ) 0.023***

11 0% p ) 0.0015* 27% p ) 0.009* 55% p ) 0.025** 18% 67% 93% 63% p ) 0.058*** 0% p ) 0.0046** 0% 0% 0% 82% 0% p ) 0.023* 64%

15 7% p ) 0.054* 40% 40% 0% p ) 0.004* 27% p ) 0.007* 91% p ) 0.04* 100% 0% p ) 0.0046** 0% 27% 0% 47% 7% 100%

11 0% p ) 0.0015* 27% p ) 0.009* 0% p < 10-6* 0% p ) 0.023 27% p ) 0.007* 63% 81% 64% 0% 0% 27% 82% 9% 100%

10-5 0.009 10-4 0.001 0.004 0.005 0.005 10-8 0.01 0.01 0.01 0.005 0.0031 0.0012

a Significant differences between groups are identified with ANOVA. Results are compared with “two-tailed, unpaired student’s t test”–Bonferroni (p ; * vs NS; ** vs COPD & E; *** vs COPD). (%) ) Frequency of each protein.

work as cytokines for the activation of the innate and acquired immune systems. Many representatives of the lipocalin family of lipid transfer/ lipopolysaccharide binding protein (LBP) gene superfamily were detected. Lipocalin 1 (LCN 1), the prototypic family protein, was identified in 27% of HS (p ) 0.01 with ANOVA). This family includes lipocalin-2 (LCN 2; neutrophil gelatinaseassociated protein, NGAL), the PLUNC family (L PLUNC 1; L PLUNC 2; S PLUNC 1), and apolipoproteins A–I, E, and J (clusterin). These proteins play a very important role in the binding of bacterial polysaccharides and of fungal β-glycans. The innate immune system PLUNC proteins are especially important for protection against Gram-negative bacteria.52 Bacterial/permeability-increasing protein-like 1 (BPIP1) also known as long palate lung nasal clone protein 2 (LPLUNC 2) was identified in 57% of NS, in 18% of CB, and in 9% of HS. It was not found in COPD (p ) 0.0038 with t test vs nonsmokers) or COPD & E (p ) 0.001 by ANOVA). This result was of interest and somewhat surprising since we were expecting that antibacterial proteins such as this would be up regulated in CB patogenesis. Also surprising was the detection of short palate lung nasal clone protein 1 (SPLUNC 1, “PLUNC”) in only one COPD subject. This may have indicated down regulation caused by cigarette smoke. Another possibility was that these proteins were produced in normal amounts but became entangled in macromolecular–microbial tangles that were centrifuged from the samples and so were unavailable for trypsin digestion. The structural protein β-Actin was identified in 71% of NS, 91% of HS, 67% of subjects with COPD, and 27% of CB and COPD & E groups (p ) 0.004 with ANOVA). MUC 5B is synthesized mainly in submucosal gland mucus cells of cartilagineous airways, and it was identified in 85% of NS, in all HS and subjects with COPD, in 81% of subjects with COPD & E, but in only 63% of subjects with CB (p ) 0.0048 with ANOVA). From these data, it seems that cigarette smoke does not influence significantly the expression of MUC 5B in these five groups. MUC 5A/C is localized to epithelial goblet cells, and it was identified in 100% of HS, 93% of COPD, 91% of CB, 63% of COPD & E, but only 43% of NS (p ) 0.04 vs COPD and 0.014 vs HS with unpaired t tests). Its low level of detection in nonsmokers was in contrast with the high levels observed in all other groups that have mucus hypersecretion phenotypes. This could indicate mucus cell hyperplasia and gland hyper4622

Journal of Proteome Research • Vol. 6, No. 12, 2007

trophy in smokers. Histone 4 (Hist 4, H4) was identified only in subjects with COPD & E (64%; p ) 0.005 if compared with other four groups by t testing). DNA is assembled around the complex of histone 4, 2A, 2B, 3, and 1 to form the nucleosome, the basic structure of condensed DNA. When Hist 4, H4, is acetylated on lysines 9 and 16, DNA is unfolded for its transcription.53 By contrast, acetylation of lysines 5 and 12 promotes nucleoside formation and DNA condensation. This suggests that the mechanism of nuclear acetylation may be disregulated in severe COPD & E, thus promoting the high expression of Hist 4 in this group. Histones 2B (Hist 2H2BE) and 1 (HIST 1, H4) were identified with high frequencies in subjects with COPD and COPD & E (64% and 27%, respectively), but their level was not significant (ANOVA). The presence of these histones suggests that chromatin is released during emphysema. Apoptosis or necrosis of alveolar cells may be a potential source of this material.54 Immunoglobulins were identified in all groups of subjects. The secreted IgA could be considered a product of mucosa glands serous cells since it is transported in mucus through the polymeric immunoglobulin receptor. The IgJ detected in subjects with COPD may originate from plasma-derived IgM since bronchial wall damage may decrease the number of IgA producing B cells or remodel airway walls to decrease serous cell number and so secretory IgA transportation. IgA (96%), IgA2 (1%), and the polymeric immunoglobulin receptor (87%) were identified in all groups, along with other serous cell proteins such as lysozyme (98%), LPUNC1 (95%), lactoferrin (87%), and SPL1 (71%). This suggests that the function of serous cells was maintained in bronchial walls of COPD subjects. The neutrophil markers (neutrophil granule peptide HNP-1 and neutrophil gelatinase associated ligand) were identified in all groups. By contrast, Cathepsin G was identified only in 27% of subjects with COPD & E (p ) 0.01 with ANOVA).

Conclusions This novel analysis is the first to assess relatively large numbers of individual induced sputum samples from normal and smoker groups to identify patterns of altered protein expression that may be related to disease pathophysiology. Standardized sputum induction and processing was essential to reach the levels of reproducibility and internal consistency we displayed. Trypsin digestion and liquid chromatography

research articles

Protein Expression in Sputum of Smokers and COPD Patients allowed the identification of over 200 proteins including scores that were outside the ranges of isoelectric points and molecular weights used in 2-DE-MS methods. Large mucins, small molecular weight innate immune proteins, and highly cationic eosinophil proteins were examples of the novel proteins that were characterized. Exhaustive peptide matching analysis using PIR and other resources generated information about immunoglobulin heavy and light chain variable and constant region expression and their differences in expression between smoker groups. Important clinical revelations include the degree of changes found in healthy smokers, the increased exocrine proteins in chronic bronchitis, and elevated plasma and neutrophil proteins in emphysema. The latter point has clinical relevance for disease classification systems such as GOLD that disregard emphysema in assessing lung disease severity.

Acknowledgment. Financial support of this work was provided by Fondazione Cariplo, Milan, Italy (grant # 2003 1643/10.8485) and US PHS Awards RO1 AI42803 and RO1 ES015382. Supporting Information Available: Tables showing the list of proteins identified in the different groups of subjects investigated (Tables S1 to S5); a Table (Table S6) containing the results of a search in BLAST using the human “PIR” database to identify protein fragments indicated as “unknown” or “hypothetical” by the NCBInr database; and a Table (Table S7) indicating the values (%) of sequence coverage, of peptide number, and of unique peptides for each protein identified in the five groups of subjects. This material is available free of charge via the Internet at http://pubs.acs.org. References (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19) (20)

Barnes, P. J. N. Engl. J. Med. 2000, 343, 269–280. Lopez, A. D.; Murray, C. C. Nat. Med. 1998, 4, 1241–1243. ATS/ERS Task Force. Eur. Respir. J. 2004, 23, 932–946. Croxton, T. L.; Weinmann, G. G.; Senior, R. M.; Hoidal, J. R. Am. J. Resipr. Crit. Care Med. 2002, 165, 838–844. Barnes, P. J. Nat. Rev. Drug Discovery 2002, 1, 437–446. Stang, P.; Lydick, E.; Silberman, C.; Kempel, A.; Keating, E. T. Chest 2000, 117, 354S–359S. Pauwels, R. A.; Buist, A. S.; Calverley, P. M.; Jenkins, C. R.; Hurd, S. S. Am. J. Respir. Crit. Care Med. 2001, 163, 1256–1276. Eriksson, S. Chest 1996, 110, 237S–242S. Sandford, A. J.; Weir, T. D.; Pare, P. D. Eur. Respir. J. 1997, 10, 1380–1391. Barnes, P. J. Thorax 1999, 54, 245–252. Barnes, P. J.; Chowdury, B.; Kharitonov, S. A.; Magnussen, H.; Page, C. P.; Postma, D.; Saetta, M. Am. J. Respir. Crit. Care. Med. 2006, 174, 6–14. Sepper, R.; Prikk, K. J. Proteome Res. 2004, 3, 277–281. Stein, T. P.; Wade, C. E. Curr. Opin. Clin. Nutr. Metab. Care 2003, 6, 95–102. Bozinovski, S.; Cross, M.; Vlahos, R.; Jones, J. E.; Hsuu, K.; Tessier, P. A.; Reynolds, E. C.; Hume, D. A.; Hamilton, J. A.; Geczy, C. L.; Anderson, G. P. J. Proteome Res. 2005, 4, 136–145. Kharitonov, S. A.; Barnes, P. J. Proc. Am. Thorac. Soc. 2004, 1, 191– 199. Merkel, D.; Rist, W.; Seither, P.; Weith, A.; Lenter, M. C. Proteomics 2005, 5, 2972–2980. Lindahl, M.; Stahlbom, B.; Svartz, J.; Tagesson, C. Electrophoresis 1998, 19, 3222–3229. Lindahl, M.; Svartz, J.; Tagesson, C. Electrophoresis 1999, 20, 881– 890. Lindahl, M.; Stahlbom, B.; Tagesson, C. Electrophoresis 1999, 20, 3670–3676. Plymoth, A.; Yang, Z.; Lofdahl, C. G.; Ekberg-Jansson, A.; Dahlback, M.; Fehniger, T. E.; Marko-Varga, G.; Hancock, W. S. Clin. Chem. 2006, 52, 671–679.

(21) Plymoth, A.; Lofdahl, C. G.; Ekberg-Jansson, A.; Dahlback, M.; Broberg, P.; Foster, M.; Fehniger, T. E.; Marko-Varga, G. Clin. Chem. 2007, 53, 636–644. (22) Plymoth, A.; Lofdahl, C. G.; Ekberg-Jansson, A.; Dahlback, M.; Lindberg, H.; Fehniger, T. E.; Marko-Varga, G. Proteomics 2003, 3, 962–972. (23) Magi, B.; Bini, L.; Perari, M. G.; Fossi, A.; Sanchez, J. C.; Hochstrasser, D.; Paesano, S.; Raggiaschi, R.; Santucci, A.; Pallini, V.; Rottoli, P. Electrophoresis 2002, 23, 3434–3444. (24) Neumann, M.; von Bredow, C.; Ratjen, F.; Griese, M. Proteomics 2002, 2, 683–689. (25) Kriegova, E.; Melle, C.; Kolek, V.; Hutyrova, B.; Mrazek, F.; Bleul, A.; du Bois, R. M.; von Eggeling, F.; Petrek, M. Am. J. Respir. Crit. Care Med. 2006, 173, 1145–1154. (26) Magi, M.; Bargagli, E.; Bini, L.; Rottoli, P. Proteomics 2006, 6, 6354– 6369. (27) Sloane, A. J.; Lindner, R. A.; Prasad, S. S.; Sebastian, L. T.; Pedersen, S. K.; Robinson, M.; Bye, P. T.; Nielson, D. W.; Harry, J. L. Am. J. Respir. Crit. Care Med. 2005, 172, 1416–1426. (28) Nicholas, B.; Skipp, P.; Mould, R.; Rennard, S.; Davies, D. E.; O’Connor, C. D.; Djukanovic´, R. Proteomics 2006, 6, 4390–4401. (29) Gevaert, K.; Van Damme, P.; Martens, L.; Vandekerckhove, J. Anal. Biochem. 2005, 345, 18–29. (30) De Marco, R.; Accordini, S.; Cerveri, I.; Corsico, A.; Sunyer, J.; Neukirch, F.; Künzli, N.; Leynaert, B.; Janson, C.; Gislason, T.; Vermeire, P.; Svanes, C.; Anto, J. M.; Burney, P. Thorax 2004, 59, 120–125. (31) Boschetto, P.; Quintavalle, S.; Zeni, E.; Leprotti, S.; Potena, A.; Ballerin, L.; Papi, A.; Palladini, G.; Luisetti, M.; Annovazzi, L.; Iadarola, P.; De Rosa, E.; Fabbri, L. M.; Mapp, C. E. Thorax 2006, 61, 1037–1042. (32) Rytila, P. H.; Lindqvist, A. E.; Laitinen, L. A. Eur. Respir. J. 2000, 15, 1116–1119. (33) Lowry, O. H.; Rosebrough, N. J.; Lewis Farr, A.; Randall, R. J. J. Biol. Chem. 1951, 193, 265–275. (34) Bligh, E. G.; Dyer, W. J. Can. J. Biochem. Physiol. 1959, 37, 911– 917. (35) Casado, B.; Pannell, L. K.; Iadarola, P.; Baraniuk, J. N. Proteomics 2005, 5, 2949–2959. (36) Kim, S.; Nadel, J. A. Treat Respir. Med. 2004, 3, 147–159. (37) Hiemstra, P. S.; van Wetering, S.; Stolk, J. Eur. Respir. J. 1998, 12, 1200–1208. (38) Lee, T. D.; Gonzalez, M. L.; Kumar, P.; Grammas, P.; Pereira, H. A. C. Microvasc. Res. 2003, 66, 38–48. (39) Pereira, H. A. J. Leukoc. Biol. 1995, 57, 805–812. (40) Heller, A. R.; Fiedler, F.; Braun, P.; Stehr, S. N.; Bödeker, H.; Koch, T. Shock 2003, 20, 166–170. (41) Abbinante-Nissen, J. M.; Simpson, L. G.; Leikauf, G. D. Am. J. Physiol. 1993, 265, L286–292. (42) Rytila, P.; Metso, T.; Petays, T.; Sohlman, A.; Työlahti, H.; KohonenJalonen, P.; Kiviniemi, P.; Haahtela, T. Respir. Med. 2002, 96, 52– 58. (43) Noah, T. L.; Wortman, I. A.; Becker, S. Immunopharmacology 1998, 39, 193–199. (44) Papadopoulos, N. G.; Papi, A.; Meyer, J.; Stanciu, L. A.; Salvi, S.; Holgate, S. T.; Johnston, S. L. Clin. Exp. Allergy 2001, 31, 1060– 1066. (45) Hartshorn, K. L.; White, M. R.; Mogues, T.; Ligtenberg, T.; Crouch, E.; Holmskov, U. Am. J. Physiol. Lung Cell Mol. Physiol. 2003, 285, 1066–76. (46) Loimaranta, V.; Jakubovics, N. S.; Hytonen, J.; Finne, J.; Jenkinson, H. F.; Strömberg, N. Infect. Immun. 2005, 73, 2245–2252. (47) Holmskov, U.; Lawson, P.; Teisner, B.; Toroe, L.; Willis, A. C.; Morgan, C.; Koch, C.; Reid, K.B. J. Biol. Chem. 1997, 272, 13743– 13749. (48) Tino, M. J.; Wright, J. R. Am. J. Respir. Cell Mol. Biol. 1999, 20, 759–768. (49) Johnston, C. J.; Stripp, B. R.; Reynolds, S. D.; Avissar, N. E.; Reed, C. K.; Finkelstein, J. N. Exp. Lung Res. 1999, 25, 81–97. (50) van der Vliet, A.; O’Neill, C. A.; Cross, C. R.; Koostra, J. M.; Volz, W. G.; Halliwell, B.; Louie, S. Am. J. Physiol. 1999, 276, 1289–1296. (51) Reynolds, S. D.; Reynolds, P. R.; Pryhuber, G. S.; Finder, J. D.; Stripp, B. R. Am. J. Respir. Crit. Care Med. 2002, 166, 1498–1509. (52) Bingle, C. D.; Gorr, S. U. Int. J. Biochem. Cell Biol. 2004, 36, 2144– 2152. (53) Strahl, B. D.; Allis, C. D. Nature 2000, 403, 41–45. (54) Tuder, R. M.; Yoshida, T.; Arap, W.; Pasqualini, R.; Petrache, I. Proc. Am. Thorac. Soc. 2006, 3, 503–511.

PR070440Q

Journal of Proteome Research • Vol. 6, No. 12, 2007 4623