Quantitative Overview of N2 Fixation in Nostoc punctiforme ATCC

Nov 14, 2008 - Biological & Environmental Systems Group, Department of Chemical and Process Engineering, The University of Sheffield, Sheffield, S1 3J...
0 downloads 0 Views 2MB Size
Quantitative Overview of N2 Fixation in Nostoc punctiforme ATCC 29133 through Cellular Enrichments and iTRAQ Shotgun Proteomics Saw Yen Ow,† Josselin Noirel,† Tanai Cardona,‡ Arnaud Taton,§ Peter Lindblad,‡ Karin Stensjo ¨ ,*,‡ and Phillip C. Wright*,† Biological & Environmental Systems Group, Department of Chemical and Process Engineering, The University of Sheffield, Sheffield, S1 3JD, United Kingdom, Department of Photochemistry and Molecular Science, The Ångstro¨m Laboratories, Uppsala University, SE-751 20 Uppsala, Sweden, and Center for the Study of Biological Complexity, Virginia Commonwealth University, 1000 West Cary Street, Richmond, Virginia 23284 Received April 14, 2008

Nostoc punctiforme ATCC 29133 is a photoautotrophic cyanobacterium with the capacity to fix atmospheric N2. Its ability to mediate this process is similar to that described for Nostoc sp. PCC 7120, where vegetative cells differentiate into heterocysts. Quantitative proteomic investigations at both the filament level and the heterocyst level are presented using isobaric tagging technology (iTRAQ), with 721 proteins at the 95% confidence interval quantified across both studies. Observations from both experiments yielded findings confirmatory of both transcriptional studies, and published Nostoc sp. PCC 7120 iTRAQ data. N. punctiforme exhibits similar metabolic trends, though changes in a number of metabolic pathways are less pronounced than in Nostoc sp. PCC 7120. Results also suggest a number of proteins that may benefit from future investigations. These include ATP dependent Zn-proteases, N-reserve degraders and also redox balance proteins. Complementary proteomic data sets from both organisms present key precursor knowledge that is important for future cyanobacterial biohydrogen research. Keywords: iTRAQ • Shotgun proteomics • Nostoc punctiforme ATCC 29133 • Nostoc sp. PCC 73102 • Tandem-MS • Dinitrogen Fixation • Heterocysts

Introduction The origin of filamentous cyanobacteria is known to lie around 2.4 billion years ago.1 The considerable variety of their ecological morphology primarily owes to their ecological tolerance toward a wide variety of environmental conditions.2,3 A number of their environmental resilience properties are associated with the differentiation of specialized cells, such as heterocyst, homogonia, and akinetes, alongside mature filaments.2,4,5 Among the number of genomics- and transcriptomics-based research projects targeted toward understanding the physiological significance and metabolic relevance of these ancient filamentous strains, the area of whole cell proteomic research has recently gained momentum. While most of the proteomic studies devoted to filamentous cyanobacteria have been carried out since 2004, they have primarily focused on traditional SDS-PAGE proteomics.6-8 With the maturation of “large scale tandem MS based methods”, the realization of a * To whom correspondence should be addressed: Prof. Phillip C. Wright, Chemical Engineering at the Life Science Interface (ChELSI), The Biological & Environmental Systems Group, Department of Chemical and Process Engineering, The University of Sheffield, Sheffield, S1 3JD, U.K. Tel: +44(0)114 2227577. Fax: +44(0)114 2227501. E-mail: [email protected]. Or Dr. Karin Stensjo¨, Department of Photochemistry and Molecular Science, The Ångstro¨m Laboratories, Uppsala University, SE-751 20 Uppsala, Sweden. Tel: +46-18-471 6586. Fax: +46-18-471 6844. E-mail: [email protected]. † The University of Sheffield. ‡ The Ångstro¨m Laboratories. § Virginia Commonwealth University. 10.1021/pr800285v CCC: $40.75

 2009 American Chemical Society

biological study based on high-throughput shotgun proteomics has become increasingly appealing.9-19 The quantitative proteomic investigation of N2 fixation using iTRAQ shotgun proteomics by Stensjo¨ et al., and a qualitative analysis by Anderson et al., mark the arrival of shotgun proteomics into this field.9,10,18 Like Nostoc sp. PCC 7120, Nostoc punctiforme ATCC 29133 (also commonly known as Nostoc punctiforme PCC 73102; hereafter denoted only as N. punctiforme) is a filamentous photoautotrophic cyanobacterium that has the ability to fix atmospheric nitrogen.2 N2 fixation also requires the terminal differentiation of a fraction of the vegetative cells (approximately 1 in every 20) into specialized heterocysts.20,21 N. punctiforme has a genome size of 9.06 Mb, marking it as one of the largest bacterial genomes sequenced to date.2 N. punctiforme has also been traditionally associated with a number of symbiotic relationships with higher plants, where candidate hosts include Macrozamia sp., Anthoceros punctatus and Gunnera manicata.22-25 Given the differences between N. punctiforme and Nostoc sp. PCC 7120, extending the complementation of quantitative proteomics here will be immensely interesting. Here, proteomes extracted individually from multiple biological replicate cultures of N. punctiforme are analyzed. Isobaric peptide labels were used to analyze proteomic changes during a change in growth condition from ammonium supplementation to N2 fixation (combined-nitrogen starvation). Journal of Proteome Research 2009, 8, 187–198 187 Published on Web 11/14/2008

research articles

Ow et al.

Figure 1. Experimental design and workflow for quantitative analysis of N2 fixing N. punctiforme. (A) Phenotypes and experimental design for ExpiTRAQ-Fil (iTRAQ 4-plex) and ExpiTRAQ-Het (iTRAQ 4-plex); duplicate samples were pooled from 4 biological replicates to achieve an estimate of population changes. ExpiTRAQ-Fil is an experiment designed to address and analyze proteomic changes between N2 fixing and NH4+ grown N. punctiforme filaments. ExpiTRAQ-Het is an experiment designed to address and analyze proteomic changes between heterocyst cells and parent filaments from N2 fixing N. punctiforme. (B) A generic illustration of the quantitative proteomics iTRAQ workflow for ExpiTRAQ-Fil and ExpiTRAQ-Het.

A complementary perspective is also provided via the purification of heterocyst fractions from filaments grown under N2 fixing conditions. We have undertaken this, since we have shown previously that these purification methods give far greater sensitivity to heterocyst-specific changes as compared to whole filament studies.10 We quantitatively examine extracted proteomes from purified heterocysts in comparison with their parent filaments to infer metabolic changes in heterocysts. Readers are encouraged to refer to our work in Nostoc sp. PCC 7120 for diagrams and discussion on the relationship between filaments, heterocysts and vegetative cells.10 A combination of iTRAQ labeling technology on both the filaments and the purified heterocysts, and LC-ESI tandem mass spectrometry are employed here to achieve rounded proteomic inferences. The findings presented here aim at extending a two-level consensus. First, we intend to better understand the N2 fixation and other related processes in N. punctiforme and Nostoc sp. PCC 7120. Higher level of understanding via NtcA binding prediction and metabolic network analysis will also be discussed. Second, we mean to establish a scalable quantitative proteomics platform for N. punctiforme to study metabolic changes under various conditions. Complementary knowledge from both N. punctiforme and Nostoc sp. PCC 7120 may provide us with a more in-depth understanding of both the N2 fixation process and the generation of bio-H2.26 188

Journal of Proteome Research • Vol. 8, No. 1, 2009

Materials and Methods Experimental Design. The experiments were designed across 4 biological replicates (replicate cultures of identical batches grown under identical conditions) of N2 fixing filaments, NH4+ grown filaments and purified heterocyst fractions. To achieve greater population-based significance, biological replicates were pooled in pairs to generate two labeled replicates corresponding to each condition. The relative studies were carried out across 2 separate iTRAQ experiments (Figure 1). Experiment I (ExpiTRAQ-Fil) was devised to investigate whole filamentlevel proteome changes during N2 fixing and NH4+ grown conditions. Experiment II (ExpiTRAQ-Het) was designed to investigate heterocyst-level changes with respect to their parent filaments during the process of N2 fixation. Both experiments employed conventional 4-plex iTRAQ techniques.10,27 Corresponding technical details for each workflow are described in the following subsections. Cell Culture Preparation. Four replicate culture sets of N. punctiforme cells were cultivated side by side in axenic conditions in 1 L flasks containing 600 mL of BG110 minimal media (N2-fixing conditions) or BG110 media supplemented with 2.5 mM NH4Cl and 5 mM HEPES (non-N2-fixing conditions) at 25 °C under constant illumination at 45 µmol photons m-2 s-1, and subjected to continuous stirring (magnetic stirrer) and aeration (air sparging). To maintain a non-N2-fixing culture,

Quantitative Overview of N2 Fixation in N. punctiforme 2.5 mM NH4Cl and 5 mM HEPES, pH 7.5, were added every 48 h during the first 5 days, and thereafter every 24 h for 2 days. Cells from all cultures were harvested simultaneously at the midexponential growth phase 24 h after the addition of fresh growth media by centrifugation at 5000g, for 10 min at room temperature. Heterocyst Purification. The method used for heterocyst isolation was adapted from our previous work10 on Nostoc sp. PCC 7120 and on N. punctiforme using protocols originally established by Razquin et al.28,29 The cultures were harvested by centrifugation at 5000g for 10 min, resuspended in extraction buffer (pH 7.2; 0.4 M sucrose, 50 mM HEPES/NaOH, 10 mM NaCl, 10 mM EDTA) at a chlorophyll a concentration of 150 µg/mL and incubated for 30 min at 4 °C. Lysozyme (SigmaAldrich) was later added at concentration of 1 mg/mL and incubated at 35 °C for 1 h. Samples were then subjected to ultrasonication in a Sonics Vibracell VC-130 (Meryin, Switzerland) for 1 min, in 6 intervals of 10 s in ice. Each homogenate was centrifuged at 1000g for 5 min, and the spun pellets containing heterocysts were resuspended in extraction buffer and centrifuged again at 250g for 3 min; this cycle was repeated 3 times. Heterocyst rich samples were then stored at -80 °C for later processing. Proteome Extraction and Content Estimation. Protein extraction was performed as adapted from Stensjo¨ et al. and Ow et al.9,10 Sample pellets were washed once with protein extraction buffer (500 mM TEAB, triethylammonium bicarbonate, pH 8.5, 0.01% (w/v) SDS and 0.1% (v/v) Triton X-100). Cells were then resuspended in the extraction buffer, and 1 vol of acid washed glass beads (425-600 µm; Sigma-Aldrich) was added. Procedures for the isolation of heterocysts are as described in the previous section. Proteins from harvested filaments were extracted by mechanical disruption by a bead beater (BIO101, Savant FastPrep FP120) four times at 30 s, with at least 1 min on ice between each run. The crude extracts were centrifuged at 12 000g for 30 min at room temperature, and the supernatant was transferred to a new 2 mL microcentrifuge tube and stored at -20 °C. Total proteome concentration for all phenotypes was determined using the RC-DC Protein Quantification Assay (Bio-Rad, Hertfordshire, U.K.). A total proteome content of 100 µg for each phenotype was precipitated in 1:6 volume ratio of acetone overnight at -20 °C and resuspended in the appropriate iTRAQ labeling buffers. Western Immunoblotting. Protein samples (7.5 µg or 3.75 µg per lane) were separated by discontinuous SDS-PAGE with a 12% resolving gel and a 3% stacking gel. After electrophoresis, proteins were either stained with Coomassie blue or transferred to Hybond ECL nitrocellulose membranes (GE Healthcare Amersham Biosciences, Buckinghamshire, U.K.). Membranes were incubated for 60 min in 5% skim milk in TBST (20 mM Tris-HCl, pH 7.4, 140 mM NaCl, 0.10% Tween), washed in TBST, and then incubated with anti-RbcL antibodies (Agrisera, Sweden) at a 1:10 000 dilution in 2% skim milk in TBST (0.010% Tween) before being washed in TBST and incubated with a horseradish-peroxidase-conjugated anti-rabbit antibody (BioRad) at a 1:5000 dilution. Immunodetection was visualized using the chemoluminescence Western blotting ECL detection reagents (GE Healthcare, Sweden) on a Chemi Doc XRS system (Bio-Rad). Four-Plex Isobaric Tag Peptide Labeling (iTRAQ). A total of 100 µg of protein from each N. punctiforme iTRAQ phenotype was reduced and alkylated as described in the manufacturer’s (Applied Biosystems) protocol. Samples were then digested with

research articles trypsin (Promega sequencing grade trypsin, Southampton, U.K.) (1:10) in 1 mM of HCl overnight at 37 °C in a temperaturecontrolled water bath. Sample peptides were subsequently labeled using protocols described by the manufacturer for the 4-plex iTRAQ labeling kits (Applied Biosystems, CA). The concentration of dissolution buffer was increased 2-fold to aid solubility, while labeling time was increased to assist complete incorporation. Increasing the labeling organic content (2-fold volume of ethanol) as described in the literature was also used.9,10 Combined samples after isobaric labeling were vacuum evaporated and stored at -20 °C prior to SCX HPLC separation. Strong Cation Exchange (SCX) Liquid Chromatography (HPLC). Peptide prefractionation was achieved using a PolySULFOETHYL A Pre-Packed Column (PolyLC, Columbia, MD) with a 5 µm particle size and a column dimension of 100 mm × 2.1 mm i.d., 200 Å pore size, on a BioLC HPLC (Dionex, Surrey, U.K.). SCX was achieved using a three-step salt buffer system, a low ionic buffer A (10 mM KH2PO4 and 25% acetonitrile), a high ionic buffer B (500 mM of KCl, 10 mM KH2PO4 and 25% acetonitrile), and an intermediate buffer C (200 mM of KCl, 10 mM KH2PO4 with 25% acetonitrile). All SCX solutions were adjusted to pH 2.85. The separation program began with a linear ramp from 0 to 21% C for 7 min, an incremental ramp from 21% to 31% C for 3 min, followed by a 31-50% linear ramp of C for 15 min. Buffer C saturation was achieved using a linear ramp from 50% to 100% over 10 min, followed by a linear ramp of buffer B from 40% to 100% over 5 min; buffer B was held for another 10 min for isocratic washing. The sample injection volume was 200 µL, and the liquid flow rate was 0.2 mL/min. The SCX chromatogram was monitored using a UVD170U ultraviolet detector and Chromeleon software v. 6.50 (Dionex, LC Packings, The Netherlands). Fractions were collected using a Foxy Jr. (Dionex) fraction collector in 1 min intervals for 60 min on low-binding 1.5 mL microcentrifuge (Eppendorf) tubes to minimize unspecific binding loss. Corresponding weaker fractions (fractions 1-15 and 45-60) were pooled and vaccuum-dried together with other fractions for storage at -20 °C prior to LC-MS/MS analysis. Nano-LC and ESI-Quadrupole-Time-of-Flight Tandem MS. Fractions collected from offline separation techniques were eluted through the Famos-Switchos-Ultimate nano-LC system (Dionex, LC Packings, The Netherlands) interfaced with a QSTAR XL (Applied Biosystems; MDS-Sciex) tandem ESIQUAD-TOF MS. Vacuum concentrated fractions were resuspended in buffer I (3% acetonitrile, 0.1% formic acid), injected and captured into a 0.3 × 5 mm trap column (3 µm C18 Dionex-LC Packings). Trapped samples were then eluted onto a 0.075 × 150 mm analytical column (3 µm C18 Dionex-LC Packings) using an automated binary gradient with a flow of 300 nL/min from 95% buffer I to 30% buffer II (97% acetonitrile, 0.1% formic acid) over 85 min, followed by a 5 min ramp to 95% buffer II (with isocratic washing for 10 min). Predefined 1 s 300-1800 m/z MS survey scans were acquired with up to two dynamically excluded precursors selected for a 3 s MS/ MS (m/z 65-2000) scan. The collision energy range was increased by 20% as compared to the unlabeled peptides in order to overcome the stabilizing effect of the basic N-terminal derivatives, and to achieve equivalent fragmentation as recommended by Applied Biosystems. SCX fractions were injected twice to increase coverage confidence in identification and quantification as detailed elsewhere.15 iTRAQ Data Analysis. Identification and quantitative analysis of peptide spectra from the QSTAR XL (Applied Biosystems, Journal of Proteome Research • Vol. 8, No. 1, 2009 189

research articles MDS Sciex) was performed using the Paragon Search algorithm30 in ProteinPilot Software v 2.0 (Applied Biosystems; MDS Sciex) and the N. punctiforme protein database (6768 ORFs, NCBI accessed June 2008). Peptide spectra from the QSTAR XL MS were interrogated with MMTS as a defined cysteinefixed modification. Miscleavage tolerances were kept undefined, thereby allowing all cleavage points for consideration. Protein identifications were generated with a filter cutoff of 99% confidence, together with the corresponding quantification and P-value. Bias normalizations were also performed as described by Ow et al., by correcting the bias median ratio of each comparison toward unity.10 A measure for false determination (FDR) rate was also calculated by using spectra “fished” from a decoy database, as detailed by Elias et al.14 The decoy database was generated using reversed proteome sequences from N. punctiforme and searched under the same qualifying interrogation parameters as those of the true database searches.10 False spectra obtained from this search were used to set the parameters needed to estimate the FDR for all our experiments. Only qualifying protein identifications that demonstrated differential regulation of greater than (1.6-fold were considered as a statistically significant change.10 Weighted standard deviations and error factors (EF) were manually computed using methods outlined in a recent article by Gan et al., on peptides showing >80% confidence.13 Metabolic Network Analysis. Several tools have recently developed to help researchers map quantitative proteomic data onto metabolic networks.31,32 These tools have to cope with the idiosyncrasies of proteomic techniques: first, the coverage of the metabolic network is fairly low (from 10% to 20% of the metabolic network); second, the small-world topology of the metabolic networks necessitates specially designed approaches in order for them to focus onto relevant pathways.33,34 More details are available elsewhere.31,32 The method developed by Noirel et al. was used in order to ease the identification of upand down-regulated pathways.32 Also, this could help us pinpoint key elements of the metabolic map that were most likely to be up-regulated, even though we had not identified and quantified them in effect. This stage of the analysis required us to reconstruct N. punctiforme’s metabolic network; this was achieved using the generic map available from KEGG (http://www.genome.jp/anonftp/) and the information provided by JGI (http://genome.jgi-psf.org/draft_microbes/nospu/ nospu.home.html). The list of genes identified alongside the corresponding E.C. (enzyme code) numbers were obtained from JGI and inserted into the KEGG generic map, where appropriate. The process was carried through using the standard Perl module XML::DOM. The result is a partial reconstruction of the KEGG pathway database for N. punctiforme (1129 reactions). The reconstructed database, original contig files and Perl source codes are accessible from the authors’ URL (http://wrightlab.group.shef.ac.uk/). The heuristic method utilized requires the set up of several important parameters: λmid ) 1.3, λup ) 1.6, and wmax ) 13. The rationale underlying this choice is described in Noirel et al.,32 with the following adjustments, owing to the presumably lower quality of our N. punctiforme metabolic reconstruction, as compared to those contained within KEGG databases: wmax was increased in order for the algorithm to traverse more nonquantified enzymes, and λmid and λup were accordingly set higher to avoid many branching, irrelevant pathways.32 Also, the weights of the following compounds were set to four: pyruvate, glutamate, and NH3 (because of their likely implication in the metabolic 190

Journal of Proteome Research • Vol. 8, No. 1, 2009

Ow et al. processes involved in N2 fixation). With these parameters, all the key factors involved in metabolic functions discussed in this study were identified. NtcA Binding in Silico Sequence Analysis. An in silico search of the DNA NtcA binding signature was performed on the regions upstream of the genes encoding the proteins quantified in this study. The upstream regions were defined so as to comprise the intergenic sequence, with a minimum length of 50 nucleotides, preceding each gene under consideration. Analyses were carried out using a Position Specific Scoring Matrix (PSSM) constructed with pseudocounts set to 1 as described below, using CyanoBIKE, a Biological Integrated Knowledge Environment (http://biobike.csbc.vcu.edu) for the analysis of cyanobacterial genomes.35 The training set for the PSSM was composed of deduced 14-nt NtcA-binding sequences upstream of a set of 40 Nostoc genes that are orthologous to genes from other cyanobacteria whose upstream sequences are known to bind NtcA. The analysis produced a score for each possible NtcA-binding site, a ratio between the probability of obtaining the putative site from the nucleotide frequencies of the PSSM and the probability of obtaining the possible site from the nucleotide frequencies of all intergenic sequences of Nostoc. To assess significance, the analysis was repeated multiple times with randomized intergenic sequences. The 14-nt sequences of the training set were deduced from the 40 Nostoc genes by applying to each of the 40 upstream sequences a preliminary PSSM derived from proven NtcAbinding sites taken from nuclease protection experiments and putative NtcA-binding sites taken from DNA fragments exhibiting gel shifts.36-39 The Nostoc sites on the parallel strand with the highest PSSM score were accepted into the training set, along with a higher scoring site on the antiparallel strand (if present). The putative cyanobacterial binding sites were identified in the same fashion, by the application of a PSSM derived from the proven 14-nt sequences. All sequences used and predictions have been included in the Supporting Information. In addition, the presence of the canonical sequence signature GTAN8TAC and the minimal signature GTN10AC was investigated together with the presence of a potential -10 Escherichia coli σ70-like boxes (TAN3T or TN4T).

Results and Discussion Because of the nature of this investigation, the discussion is broken down into three parts. Part 1 covers basic data analysis across all methods with the consideration of error minimization and false positive estimation. Part 2 extends the analyzed data to provide sensible biological interpretations covering the heterocyst-specific and complete-filament regulations. Finally, Part 3 comments on the comparison between the proteome changes during the N2 fixation process in N. punctiforme and Nostoc sp. PCC 7120, seeking a rounded metabolic consensus across both organisms. Part 1. Quantitative Data Analysis. 1.1. Heterocyst Sample Purity. Effectiveness of the heterocyst purification method was initially confirmed by optical microscopy (Zeiss Axiostar plus; Carl Zeiss, 100× magnification). No vegetative cells were visible after the cleaning of the heterocysts by low speed centrifugations (ca. 1000g) after the lysozyme treatment. Western immunoblotting was used as a final check for heterocyst sample purity. The RuBisCo subunit ribulose-1,5-bisphosphatase large precursor (RbcL) was used as indicator, since heterocysts exhibit very little or no RuBisCo.40-42 According to the chemoluminescence intensity, the heterocyst fraction

Quantitative Overview of N2 Fixation in N. punctiforme was at least 95% clear of contaminating vegetative cells (see Supporting Information). 1.2. iTRAQ Data Analysis. Separate aliquots of SCX iTRAQ fractions were contemporaneously analyzed as described by Ow et al.10 A total of 77 449 and 46 566 spectra were acquired for ExpiTRAQ-Fil and ExpiTRAQ-Het, respectively, using the QSTAR XL Qq-TOF-MS (Applied Biosystems, MDS-Sciex), of which 38 618 and 12 532 spectra belonged to the 99% confidence interval. Protein identification, quantification and data filtering were performed as described by Ow et al. using the proteome database of N. punctiforme,9,10 and as described in Materials and Methods. There were 643 proteins that were identified for ExpiTRAQ-Fil and 516 proteins for ExpiTRAQHet at 95% confidence interval. Both sets of identifications were merged to give 721 unique protein identifications (see Supporting Information). Respectively, the protein lists were further filtered to give 497 proteins for the filament study and 377 proteins for the heterocyst study within the 99% confidence interval. The lists of these proteins at 99% alongside their iTRAQ ratio, P-value and error factors are provided in the Supporting Information. Spectral quantification and the full peptide list are also available from the authors’ URL (http://wrightlab. group.shef.ac.uk/). For false determination rate estimations, the method’s rationale was initially discussed by Elias et al.14 With the use of the technique similarly applied in previous works,10 there were 179 and 34 spectra that were determined, respectively, from the decoy database for both experiments at the 99% confidence interval (ExpiTRAQ-Fil and ExpiTRAQ-Het; see Supporting Information). Therefore, with reference to the similarly identified spectra from the true database, the estimated FDRs are 0.92% and 0.54% for ExpiTRAQ-Fil and ExpiTRAQ-Het, respectively (179:38618, 34:12532) using the justification as described by Elias et al.14 1.3. General Proteome Overview. An overview of the total proteome identified shows that 16% of the proteome displays differential abundance between N. punctiforme filaments grown under diazotrophic conditions and those in NH4+ supplemented conditions (ExpiTRAQ-Fil). In filaments grown under diazotrophic conditions, 29 proteins were more abundant while 42 proteins were less abundant. Proteome analysis of samples using purified heterocysts and their parent filaments (ExpiTRAQ-Het) revealed more changes: We observed 238 differentially expressed proteins; 117 up-regulated and 121 down-regulated proteins in the purified heterocysts as compared to the parent filaments. Replicate pool samples were also cross-compared to gain an appreciation of the data reproducibility. Estimation for the reproducibility presented in this study was performed using methods described by Chong et al.,15 and the coefficient of variation (CV) values were 13.3% and 16.6% for ExpiTRAQ-Fil and ExpiTRAQ-Het, respectively. These values are considered low, and correlate well with those previously presented in other iTRAQ studies (biological replicates produced a CV of between 15-25%).9-11,13,15,16 Part 2. Differential Abundance Analysis. This section emphasizes the findings obtained from both quantitative proteomics shotgun experiment sets (ExpiTRAQ-Het and ExpiTRAQFil). The proteomic data was rigorously assessed, and the subsequent findings represent those for which confidence is greater than 99%, and with sufficient spectral quantifications to justify metabolic analysis. The protein lists containing all comparisons between the two experiments are given in the Supporting Information. With the sue of the heterocyst data set (ExpiTRAQ-Het), the automated extraction of the metabolic

research articles pathways that behave in a consistent fashion could retrieve the key factors discussed in this study. Using the filament data set (ExpiTRAQ-Fil), however, the networks identified are unexpectedly small, even with the customized settings described earlier. This observation stresses once again the importance of implementing a cell purification strategy in order to secure more striking and compelling evidence. The reproducibility of the quantifications became especially obvious through these analyses, since the same networks emerged from the various replicates. With the quantifications provided by ExpiTRAQ-Het, some of the pathways found to be of greater abundance in the heterocysts when compared to vegetative cell rich N2 fixing filaments using this methodology are: I. The oxidative part of the pentose phosphate pathway. II. The TCA pathway that generates the 2-oxoglutarate which is the carbon skeleton needed for ammonia assimilation. III. The glutamine synthase and the nitrogenase cluster. IV. The cyclic electron flow around PSI. There are other pathways that are notably up-regulated, in particular, the pyruvate metabolism and amino-acyl-tRNA synthetases. However, to preserve better the overall aim and the conciseness of this article, discussions on these pathways have been deferred to future investigations. Though given its consistency with other quantifications, it appears likely that these pathways may have intimate links to nitrogen assimilation. An illustration of the mapping computation is provided in Figure 2, where the diagram illustrates the metabolic link constructed using identifications obtained in this investigation (ExpiTRAQ-Fil Figure 2A and ExpiTRAQ-Het Figure 2B). The transcriptional regulator NtcA acts as a controller of nitrogen metabolism in cyanobacteria, and is crucial for heterocyst differentiation but is also involved in the regulation of the metabolism in mature heterocysts.36 To investigate also the possible involvement of NtcA in the regulation of the differentially expressed proteins in the heterocysts as compared to whole filaments and in diazotrophic filaments as compared to ammonium grown filaments, we have taken an in silico binding analysis approach to elaborate this. Similar analyses have previously been carried out for other genomes, and have proven to be useful in discovering unknown proteins of interest.43,44 The technique, and its applicability to proteomic data of this nature, has been discussed in our previous publication.10 We have observed that the information given by these analyses on high-throughput proteomic data correlates very well with a number existing hypotheses about nitrogen regulatory pathways. Briefly, the binding site in NtcA regulated promoters has for the canonical sequence signature GTAN8TAC, GTN10AC being recognized to be essential for binding. Other variations to the canonical signature have been shown to be recognized by NtcA, for example, GACN8AAC found in the upstream region of petH, and ACTN8TAC found in the upstream region of nifH.45 The signature is often centered 41.5 nucleotides upstream of the transcriptional start site in a promoter that carries a -10 box, although the location can be fairly variable.46 Across the 551 proteins at the 99% confidence interval (721 at the 95% confidence interval) quantified in the present study, 258 identifications showed potential NtcA binding sites. A total of 53.6% of the genes encoding proteins significantly more abundant in the heterocysts and 43.3% of those more abundant in the whole N2 fixing filaments have a potential NtcA binding site within their upstream sequences. Hits with a PSSM score Journal of Proteome Research • Vol. 8, No. 1, 2009 191

research articles

Figure 2. Visualizations of the metabolic network reconstruction for all identified proteins in filament-based study (ExpiTRAQ-Fil), and heterocyst study (ExpiTRAQ-Het); the definitions and parameters for network reconstruction are as described by Noirel et al. using the N. punctiforme ORNL annotations and KEGG reference databases.32 The color of each metabolic node represents quantifications in greater (red) or lower (blue) abundances. Inflation of the node radius represents the magnitude of differential abundance. Each node is representative of an enzyme and each node link represents a given enzymatic reaction and its compound. (A) Network for ExpiTRAQ-Fil; highlighted regions represent clusters of important metabolic pathways in the N2 fixing (diazotrophic) condition filaments. (B) Network for ExpiTRAQ-Het; highlighted regions represent clusters of important metabolic pathways in the heterocyst cells.

lower than 100 were not taken into account. Analyses to assess significance showed that an average of 207 hits was obtained on randomized upstream sequences using a threshold of 100. The number of hits obtained according to different thresholds based on natural and randomized sequences is included in the Supporting Information. The putative NtcA control of proteins differentially expressed in heterocysts and their parent diazotrophic filaments will be discussed in the heterocyst level proteome overview section. All other computations and in silico prediction data are also included in the Supporting Information. 2.1. Filament Level Overview of Nitrogen Assimilation, Pentose Phosphate Pathway and TCA Pathway. Varying adaptation between diazotrophic and NH4+ supplemented conditions would naturally lead to changes related to nitrogen metabolism. As previously reported by Stensjo¨ et al. and Ow et al., initiating the N2 fixation process in Nostoc sp. PCC 7120 also led to significantly increased abundance of oxidative pentose phosphate cycle (oxPP) proteins and also in a number of TCA pathway candidates.9,10 Here, six proteins related to the nitrogen assimilation pathway were reliably quantified. N2 fixing nitrogenase can192

Journal of Proteome Research • Vol. 8, No. 1, 2009

Ow et al.

Figure 3. Partial metabolic maps of TCA pathway, nitrogen assimilation and pentose phosphate pathway. Nodes denote pathway enzymes while their color denotes differential abundance (gray, unchanged; red, greater abundance; blue, lower abundance). Nodes are accompanied with designated E.C. annotations. (A) Quantifications for N2 fixing versus ammonium grown filaments; ExpiTRAQ-Fil. (B) Quantifications for heterocysts versus N2 fixing filaments; ExpiTRAQ-Het. Nodes and connections for all reference pathways are modified from KEGG.

didates NifHDK (encoded by Npun_R0415, Npun_R0388, and Npun_R0390,) were found to be significantly more abundant in proteomes from N2 fixing cells (see Supporting Information). They were, respectively, quantified with 9.2 ( 0.3-, 7.4 ( 0.3and 5.4 ( 0.2-fold greater abundance. Glutamine synthetase, GlnA (encoded by Npun_R5387), which is responsible for the incorporation of NH3 into glutamate/2-oxoglutarate, was also quantified. Its relative abundance however was only found to be 1.4 ( 0.1-fold greater in N2 fixing filaments (below the (1.6fold cutoff threshold). However, there have been recent transcriptome microarray studies of N. punctiforme that measured a significant increase in transcript level of glnA.5 Figure 3 shows the enzymes quantified across the pentose phosphate and N2 assimilation pathways. Two cyanophycinase subunits, CphB1 (encoded by Npun_ F1821) and CphB2 (encoded by Npun_R5824), were also reliably identified and quantified in this study. The function of CphB proteins is to degrade the multi-L-arginyl-poly-Laspartic acid cyanophycin nutrient reserve.47 Consistent observations showed that the CphB1 was 2.4 ( 0.3-fold more abundant during ammonium growth. This was similarly observed in Nostoc sp. PCC 7120 iTRAQ experiments, where the CphB1 protein was also observed to be less abundant in diazotrophic conditions.10 CphB2, in contrast, did not show differential abundance between diazotrophic or ammonium fed conditions. Complementary observations provided by CphB1 and B2 are in accordance with earlier transcriptional evidence in Nostoc sp. PCC 7120, which suggested that CphB2 was more

Quantitative Overview of N2 Fixation in N. punctiforme expressed than CphB1 in the absence of supplemented ammonium.48 These observations are useful to corroborate findings previously presented in the case of Nostoc sp. PCC 7120, given the missing quantifications of CphB2 at that time.10 Fifteen encoded proteins involved in the pentose phosphate pathway were reliably quantified (see Supporting Information and Figure 3). The increases in abundance corresponding to Gnd (2.6 ( 0.3), Pgl (1.8 ( 0.3), Fbp (1.6 ( 0.23) and OpcA (1.7 ( 0.3) were the greatest during diazotrophic growth. G6PD (Zwf) was also similarly determined to be below the 1.6-fold cutoff. Nevertheless, one may suggest that the two steps catalyzing the formation of NADPH were more highly expressed in response to the increased requirements of reducing power to fix N2, even though the overall differential abundances were not as pronounced across the pentose phosphate pathway as in Nostoc sp. PCC 7120.9,10 Readers are encouraged to refer to observations using purified heterocysts presented in later sections as corroborative evidence to the inferences made here about the pentose phosphate pathway. Similarly to Nostoc sp. PCC 7120 and other cyanobacteria, N. punctiforme does not possess a complete TCA cycle.49 Six proteins were quantified in this investigation (see Supporting Information). The TCA pathway in cyanobacteria has been traditionally understood to supply the carbon skeleton 2-oxoglutarate (2-OG) required during ammonium assimilation between the glutamate/glutamine interchange as well as to sense the carbon to nitrogen ratio in the cell.50 We have successfully identified proteins along the biochemical path from citrate to 2-OG through citrate synthase (GltA, Npun_R5627), aconitase B (AcnB, Npun_F1921) and isocitrate dehydrogenase (Icd, Npun_R5474). However, we did not observe any significant change in abundance at the filament level. Nonetheless, there are pieces of evidence of greater abundance of TCA pathway proteins during diazotrophic growth, in strain Nostoc sp. PCC 7120.9,51 2.2. Filament Level Overview of Energy Metabolism, Photosynthesis and Cell Division Proteins. Proteins annotated in the central glycolysis/gluconeogenesis metabolic pathway were also quantified. Although there were abundance increases in a number of pentose phosphate pathway candidate proteins, the overall abundance of the glycolysis/gluconeogenesis proteins remained unchanged (below the 1.6-fold cutoff). Fourteen F0F1-type ATP synthase and NADH dehydrogenase (NADH:ubiquinone oxidoreductase) subunits were also identified. Candidates NdhA, NdhG, NdhH, NdhI, NdhJ, NdhK, NdhM and NdhN for the NDH-1 complex, alongside AtpA, two AtpB, AtpC, AtpD, AtpE and AtpF for ATP synthase subunits were identified in this study (see Supporting Information). Two identified ATP synthase proteins (AtpE; 1.8 ( 0.2 and AtpF; 1.7 ( 0.15) showed greater abundance in N2 fixing conditions. There were also 34 proteins identified across PSI, PSII, phycobilisome proteins, heme/chlorophyll metabolism and the cytochrome b6/f complex. However, the relative abundances of all related proteins in diazotrophic conditions versus ammonium supplemented growth showed no difference at the filament level. When we compare our observations with recently published transcriptional data, similar conclusions can be reached: a majority of the PSI and PSII and phycobilisome gene expressions are not significantly different.5 The cell division protein FtsZ was also identified. The relative quantifications for FtsZ did not show any differential abundance in diazotrophic filaments. Two additional FtsH proteins (ATP-dependent Zn protease; Npun_R1355 and Npun_F4881)

research articles were also identified, but did not show a change in abundance. The FtsH proteins identified in this study have previously been shown to be wrongly annotated as a cell division protein, and were found to be related more to the assembly/disassembly of PSII and PSI.52,53 2.3. Heterocyst Level Overview of Nitrogen Assimilation, Pentose Phosphate Pathway and TCA Pathway. Cellular purification of heterocyst fractions allowed us to place the focus onto the changes specific to the heterocysts and their metabolism with respect to the parent N2 fixing filaments.10 This strategy was also presented in microarray investigations of Nostoc sp. PCC 7120 to understand the regulation that leads to the differentiation into heterocysts in combined nitrogen depleted conditions.54 As suggested by previous work on Nostoc sp. PCC 7120, a number of important changes were expected to emerge more markedly.10 In comparison, four proteins related to the response and assimilation of nitrogen were quantified. NH3 incorporating GlnA and an additional two-component accessory, the nitrogen regulatory protein P-II (GlnB, Npun_F4466), were accurately quantified. GlnA (1.4 ( 0.1) in the heterocysts, similar to the filament level study, was found to be below the 1.6-fold cutoff. However, GlnB (1.7 ( 0.05) was found to be more abundant in the heterocysts. Although a number of observations had already reported an increase of the expression of glnA, the proteomic quantification for GlnB enzyme was new in this case.10,55 An orthogonal observation was also reported in previous transcriptional studies, where an increased transcription of glnB was observed subsequently to the deprivation of NH4+.56 Not unexpectedly, our in silico search identified a potential NtcA binding site in the glnA promoter, even though no significant change would be measured (