Elucidation of Xylem-Specific Transcription Factors ... - ACS Publications

Aug 27, 2015 - Populus trichocarpa was the first model woody plant to have a sequenced and annotated genome(31) because of its comparatively small ...
0 downloads 0 Views 2MB Size
Article pubs.acs.org/jpr

Elucidation of Xylem-Specific Transcription Factors and Absolute Quantification of Enzymes Regulating Cellulose Biosynthesis in Populus trichocarpa Philip L. Loziuk,† Jennifer Parker,† Wei Li,‡ Chien-Yuan Lin,‡ Jack P. Wang,‡ Quanzi Li,§ Ronald R. Sederoff,‡ Vincent L. Chiang,‡ and David C. Muddiman*,†

Downloaded by UNIV OF CAMBRIDGE on September 2, 2015 | http://pubs.acs.org Publication Date (Web): September 1, 2015 | doi: 10.1021/acs.jproteome.5b00233



W.M. Keck FTMS Laboratory, Department of Chemistry, North Carolina State University, Raleigh, North Carolina 27695, United States ‡ Forest Biotechnology Group, Department of Forestry and Environmental Resources, North Carolina State University, Raleigh, North Carolina 27695, United States § State Key Laboratory of Tree Genetics and Breeding, Chinese Academy of Forestry, Beijing 100091, China S Supporting Information *

ABSTRACT: Cellulose, the main chemical polymer of wood, is the most abundant polysaccharide in nature.1 The ability to perturb the abundance and structure of cellulose microfibrils is of critical importance to the pulp and paper industry as well as for the textile, wood products, and liquid biofuels industries. Although much has been learned at the transcript level about the biosynthesis of cellulose, a quantitative understanding at the proteome level has yet to be established. The study described herein sought to identify the proteins directly involved in cellulose biosynthesis during wood formation in Populus trichocarpa along with known xylem-specific transcription factors involved in regulating these key proteins. Development of an effective discovery proteomic strategy through a combination of subcellular fractionation of stem differentiating xylem tissue (SDX) with recently optimized FASP digestion protocols, StageTip fractionation, as well as optimized instrument parameters for global proteomic analysis using the quadrupole-orbitrap mass spectrometer resulted in the deepest proteomic coverage of SDX protein from P. trichocarpa with 9,146 protein groups being identified (1% FDR). Of these, 20 cellulosic/ hemicellulosic enzymes and 43 xylem-specific transcription factor groups were identified. Finally, selection of surrogate peptides led to an assay for absolute quantification of 14 cellulosic proteins in SDX of P. trichocarpa. KEYWORDS: shotgun discovery proteomics, absolute quantification, PC-IDMS, cellulose biosynthesis, transcription factor, targeted mass spectrometry, SRM



INTRODUCTION

major polysaccharides in the secondary wall. Unlike cellulose, hemicelluloses are shorter chain (500−3,000 residues)branched polysaccharides. These polysaccharides can contain many different sugar monomers including xylose, mannose, galactose, rhamnose, and arabinose. Unlike cellulose, hemicelluloses tend to be random and amorphous in structure and are more easily hydrolyzed. Hemicelluloses are synthesized from sugar nucleotides in the Golgi apparatus and are subsequently transported to the plasma membrane and the secondary wall where they form a network with cellulose microfibrils along with the structural support of lignin.7 The most abundant of these hemicelluloses are xylan (β-D-xylose) and glucomannan (β-D mannose and β-D glucose), representing 85% and 15% of the total hemicellulose in flowering plants, respectively.8

Cellulose, consisting of 40−45% of the wood dry weight of most woody vascular plants, is the most abundant biopolymer on earth.2,3 It is also a substrate necessary for pulp and paper as well as cellulosic-based biofuel production. Cellulose, along with the hemicelluloses xylan (4-O-methylglucuronoxylan) and glucomannan, are the three major polysaccharides in woody plant cell walls. The plant secondary cell wall also contains lignin, a biopolymer made up of phenylpropanoid monolignols, which acts as a barrier, conferring mechanical strength, aiding in water transport, and comprising ∼25% of the dry weight of wood in vascular plants.3 Contrasting the hydrophobic nature of lignin, cellulose is an unbranched polysaccharide consisting of β (1 → 4) linked D-glucose units that is synthesized by cellulose synthase complexes located in the plasma membrane.4−6 Glucan chains then form long cellulose microfibrils (7,000−15,000 residues) through hydrogen bonding.6 In addition to cellulose, hemicelluloses represent the other © XXXX American Chemical Society

Received: March 17, 2015

A

DOI: 10.1021/acs.jproteome.5b00233 J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Downloaded by UNIV OF CAMBRIDGE on September 2, 2015 | http://pubs.acs.org Publication Date (Web): September 1, 2015 | doi: 10.1021/acs.jproteome.5b00233

Journal of Proteome Research

protein extraction methods, must be taken into account to detect them. Subcellular fractionation of nuclear proteins by differential centrifugation reduced sample complexity, allowing the analysis of lesser abundant nuclear localized proteins. Filteraided sample preparation (FASP) optimized by our previously published design of experiments was then employed for sample cleanup of incompatible detergents.39 Implementing StageTip fractionation as an orthogonal mode of separation was utilized to provide increased proteome coverage.40,41 Finally, instrument parameters optimized by our previously published design of experiments for shotgun proteomics on the quadrupole orbitrap instrument allowed us to further maximize proteome coverage and identify all known cellulose biosynthetic proteins and many transcription factors that regulate their abundance and activity.42 Combining multiple bioinformatics workflows with shotgun LC−MS data provided a greater number of protein identifications. Finally, data were used to develop a targeted assay for absolute quantification using protein cleavage-isotope dilution mass spectrometry (PC-IDMS).43 Unique surrogate peptides meeting criteria specified previously36 were selected for 14 proteins involved in cellulose biosynthesis. Using selected reaction monitoring (SRM) on a triple quadrupole mass spectrometer, peptide detection was optimized and quantified for 3 wild-type trees to assess the absolute quantities and the biological variation in abundance of these proteins.

Although the chemistry of cellulose and hemicelluloses is well understood and many of the proteins involved in its biosynthesis have been characterized, a comprehensive understanding of cellulose biosynthesis and its regulation is still lacking. Quantitative measurements of the proteins involved in cellulose biosynthesis have yet to be obtained. A quantitative understanding of both transcript and protein levels are necessary to uncover the extent and direction (positive or negative) of interactions between transcription factors, relative gene transcript quantity. and enzymes. Through assessment of transcriptional, post-transcriptional, and translational controls, we can obtain a more precise and quantitative understanding of the regulatory mechanisms underlying secondary wall biosynthesis at a systems level (both transcript and protein quantities must be known). This quantitation is necessary for the development of predictive models that will lead to more targeted and efficient transgenic modifications aimed at developing secondary wall types/compositions that are more efficient for industrial processes. Significant progress has been made in understanding and identifying the key players involved in transcriptional regulation of secondary wall biosynthesis.3,9−16 Genetic studies in Arabidopsis have identified many of the genes affecting cellulose and xylan biosynthesis,6,8,17−26 and roughly half of these encode transcription factors.12,27,28 Members of the MYB and NAC domain families of transcription factors initiate secondary wall biosynthesis through the activation of biosynthetic genes or other transcriptional regulators.29,30 NAC transcription factors are primarily first-level direct regulators, whereas members of the MYB family act as second-level master switches for secondary cell wall biosynthesis.12 Populus trichocarpa was the first model woody plant to have a sequenced and annotated genome31 because of its comparatively small genome (400 Mb) compared to that of other woody plants, such as the loblolly pine tree Pinus taeda (22 Bb).32 Additionally, the ease of transgenic manipulation and high rate of growth make P. trichocarpa a useful model for studying secondary cell wall biosynthesis during wood formation.3 On the basis of the P. trichocarpa genome sequence and transcriptome data, 38 genes associated with the biosynthesis of cellulose and hemicelluloses during wood formation in differentiating xylem tissue have been identified.33,34 These include 5 cellulose, 17 xylan, 2 glucomannan biosynthetic genes, and 14 primary transcription factor genes.27 Furthermore, on the basis of the transcriptomic data (Chiang, unpublished), 153 other xylem-specific transcription factors may be involved in regulating secondary wall biosynthesis. Recent improvements in the detection limits of LC−MSbased proteomics have made it effective for identifying and quantifying thousands of proteins from a single sample.35 Despite these recent advances in proteomics, much of the P. trichocarpa proteome remains uncharacterized. In particular, low abundance proteins, such as transcription factors, have consistently evaded detection by mass spectrometry. Previous attempts to characterize the P. trichocarpa proteome have failed to identify transcription factors involved in cellulose biosynthesis.36 Integrating technical knowledge from our previous work in absolute protein quantitation,37,38 we aimed to establish an effective discovery proteomic approach for identification of low abundance transcription factors. Cellulosic biosynthetic proteins are known to be membrane proteins, whereas transcription factors are found in the nucleus. Careful consideration of sample preparation, particularly



EXPERIMENTAL SECTION

Materials

Unless otherwise stated, all reagents were purchased from Sigma-Aldrich (St. Louis, MO). All solvents were HPLC-grade from Honeywell Burdick & Jackson (Muskegon, MI). Nuclear Protein Isolation from Stem Differentiating Xylem (SDX)

Stem differentiating xylem was collected from six-month-old, wild-type Populus trichocarpa (genotype Nisqually-1) grown in a greenhouse and harvested for SDX tissue as previously described.15 Nuclear protein isolation was optimized based on previous methods for isolation of nuclear SDX protein.44−46 Nuclear isolation buffer (NIB) consisted of 10 mM HEPES (pH 7.6, KOH), 1 mM dithiothreitol (DTT), 5 mM MgCl2, and 0.8 M sucrose. Three layers of Miracloth (EMD Millipore Billerica, MA) were formed into a cone inside a funnel. Ten grams of fresh SDX was ground to a powder in liquid nitrogen using an analytical mill, and the powder was transferred from the cold mill to an ice-cold 50 mL Falcon tube. Thirty milliliters of cold NIB was added and mixed well. The mixture was homogenized for 2 min in NIB using an ULTRA-TURRAX homogenizer. The sample was vortexed and then mixed for 10 min at 4 °C on a rotating wheel. The resulting suspension was passed gradually through the three layers of Miracloth into another ice-cold 50 mL Falcon tube, and the Miracloth was squeezed to collect the liquid. The sample was then centrifuged for 10 min at 1800g at 4 °C. The supernatant was removed, and the pellet was resuspended in 2 mL of NIB buffer containing protease inhibitors (1 mM PMSF, 1 μg/mL of pepstatin A, and 1 μg/mL of leupeptin). This buffer is designated as NIBA. For cell membrane lysis, 10% TRITON X-100 was added to a final concentration of 0.3% (according to the volume used to resuspend the pellet) and mixed well. The lysate was removed and placed on a 0.8 mL cushion of 1.5 M sucrose in 1.5 mL tubes (∼0.6 mL of lysate in each B

DOI: 10.1021/acs.jproteome.5b00233 J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research tube). The tubes were centrifuged at 12,000g for 10 min at 4 °C. The upper green phase and the sucrose cushion were aspirated without disturbing the pellet of nuclei. The pellet was then washed 2× by resuspension in 1 mL of NIBA and centrifuged for 5 min at 12,000g. Nuclear protein extraction buffer (NPEB) containing 20 mM HEPES (pH 7.9, KOH), 5 mM DTT, 0.42 M NaCl, 1.5 mM MgCl2, 0.2 mM EDTA, 25% glycerol, 0.01% triton, and protein inhibitors (1 mM PMSF, 1 μg/mL of pepstatin A, and 1 μg/mL of leupeptin). The nuclear pellet was thoroughly resuspended in NPEB and vortexed at medium high speed for 30 min at 4 °C. The resuspended pellet was then centrifuged for 10 min at 12,000g at 4 °C. The resulting supernatant was transferred to a clean ice-cold tube and stored at −80 °C for further processing.

formic acid containing 0.001% zwittergent 3-16 (Calbiochem, La Jolla, CA) for further analysis. Sample Preparation for Targeted Analysis

Crude xylem protein extraction was performed as described previously.36 Total protein was quantified using a Coomassie plus Bradford assay,49 and 100 μg of protein of each sample in duplicate was prepared using FASP under conditions optimized for absolute quantification.44 All sample preparation solvents contained 50 mM Tris−HCl pH 7. Samples were denatured and reduced using 8 M Urea and 50 mM DTT at 56 °C for 30 min and then alkylated using 200 mM iodoacetamide for 1 h at 37 °C. Samples were washed 3× with 400 μL of trypsin digestion buffer containing 2 M Urea and 10 mM CaCl2 and centrifuged at 14,000g at 20 °C for 15 min. SIL peptides were resuspended in 0.001% zwittergent 3-16, and 5 μL (1 pmol of each peptide) was added to 45 μL of digest buffer. Samples were digested for 8 h using a modified porcine trypsin at a 1:5 enzyme-to-protein ratio. Digestion was quenched using 1% formic acid in 0.001% zwittergent 3-16. Samples were dried and stored at −80 °C prior to LC−MS analysis. An additional aliquot was processed from each sample separately using the previously developed workflow for absolute quantification of lignin proteins.36

Downloaded by UNIV OF CAMBRIDGE on September 2, 2015 | http://pubs.acs.org Publication Date (Web): September 1, 2015 | doi: 10.1021/acs.jproteome.5b00233

Filter-Aided Sample Preparation (FASP) and StageTip Fractionation

FASP was performed on the nuclear-enriched protein using recently optimized digestion conditions for shotgun proteomic workflows.36,38 All solutions were buffered by 50 mM Tris-HCl pH 7. Protein extracts were incubated for 30 min at 56 °C in a 2-fold dilution of the protein solution using 100 mM DTT in 8 M Urea to denature and reduce the extracted proteins. Each sample was then alkylated using a 1:1 sample-to-reagent volume of 1 M iodoacetamide in 8 M urea for 1 h at 37 °C. Samples were concentrated once for 15 min at 14,000g and 20 °C using a 0.5 mL 10 kDa molecular weight cutoff centrifugation filter (EMD Millipore Billerica, MA). Samples were then washed 3× for 15 min at 14,000g at 20 °C with 400 μL of 8 M urea to remove the remaining detergent and 3× for 15 min at 14,000g at 20 °C using 400 μL of trypsin digestion buffer containing 2 M urea and 10 mM CaCl2. Samples were then subjected to proteolysis in the filter with trypsin digestion buffer at 37 °C for 12 h using a 1:50 enzyme-to-substrate ratio of modified porcine trypsin. Digested peptides were eluted through centrifugation at 14,000g at 20 °C for 15 min. One hundred microliters of pH 11 Britton Robinson universal buffer acid mixture at 0.1 M ionic strength (Ricca Chemicals, Arlington, TX) was added to the filter and centrifuged again for 15 min at 14,000g at 20 °C. The eluted peptides were then saved for StageTip fractionation. StageTip fractionation using anion exchange chromatography was adapted as described previously with modifications.40,47,48 A Kel-F Hub, point style 3, gauge 16 (1.19 mm) needle was used to extract six Empore anionic extraction disks (3M, St. Paul, MN), and a 100 μL plunger was used to release and stack the disks in a 200 μL pipet tip (Eppendorf, Hauppauge, NY), which was cut to fit a syringe tip. StageTips were conditioned by adding 100 μL of methanol. Manual elution from the anion StageTip was performed using a 10 mL syringe fitted onto the top of the StageTip. Subsequently, 100 μL of 1 M NaOH and two 100 μL Britton Robinson buffer pH 11 washes were performed. Peptides in 100 μL of pH 11 buffer were then loaded onto the anion StageTips and passed over the column using the 10 mL syringe, and the eluent was collected in a microcentrifuge tube and set aside. Peptides were then eluted again using 100 μL of pH 11 Britton Robinson buffer. Additional peptide elutions from the anion StageTip were performed with Britton Robinson buffers at pH 8, 6, 5, 4, and 3 into individual StageTips. Buffer pH 3 containing 0.25 M NaCl was added to ensure elution of all peptides from the anion StageTips. The fractions were then dried and stored at −80 °C until analysis when they were reconstituted in 50 μL of 1%

Synthetic Peptide Standards

High purity (>95%) synthetic peptide standards obtained from New England Peptide (Gardner, MA) were dissolved in 5% acetonitrile and 0.1% formic acid. C-terminal lysine or arginine residues were 13C6 labeled, and all peptides were quantified by the vendor using amino acid analysis (AAA). LC−MS/MS Analysis

StageTip fractionated nuclear SDX protein was analyzed in triplicate using a Thermo Scientific EASY nLC II (Thermo Scientific, San Jose, CA) operated in a trap and elute configuration and directly coupled to a quadruple Orbitrap benchtop mass spectrometer (Q Exactive, Thermo Scientific). A 5 μL injection was performed onto a nanoflex Chip-LC trap column, 200 μm × 0.5 mm in-line with an analytical column, 75 μm × 15 cm (Eksigent Technologies, Dublin, CA) both packed with Chrom XP C18-CL (3 μm particle size, 120 Å pore size). A 240 min elution was performed at a flow rate of 350 nL/min utilizing a 5−30% B gradient. Mobile phases A and B were composed of water/acetonitrile/formic acid (98/2/0.2% and 2/ 98/0.2%, respectively). Q-Exactive instrument parameters, previously optimized for global proteomics, were implemented for MS analysis.16 A full-MS, data-dependent top-12 MS/MS method was performed. MS scans in the Orbitrap were acquired with 70,000 resolving power and an MS/MS resolving power of 17,500 at m/z = 200. Automatic gain control (AGC) target for MS scans was set to 1E6 with a maximum ion injection time (IT) of 30 ms, and AGC target of 2E4 was used for MS/MS scans with a maximum injection time of 120 ms. The underfill ratio was set to 1%. The m/z range was set from 400 to 1600. Microscans were set to 1 for both MS and MS/ MS scans. Charge state screening was enabled. Unassigned and 1+ charge states were rejected from MS/MS isolation and activation. A normalized collision energy of 27 was used with an isolation window of 2.0 m/z. Dynamic exclusion was set to 60 s. The ambient ion used for lock mass was 445.120025 m/z. A capillary voltage of 2.0 kV was used, and the temperature was set to 250 °C. C

DOI: 10.1021/acs.jproteome.5b00233 J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research

Downloaded by UNIV OF CAMBRIDGE on September 2, 2015 | http://pubs.acs.org Publication Date (Web): September 1, 2015 | doi: 10.1021/acs.jproteome.5b00233

Figure 1. An optimized discovery proteomic workflow consisting of subcellular fractionation of xylem protein by differential centrifugation, previously optimized FASP digestion, StageTip fractionation combined with triplicate analysis of each fraction in 4 h gradient runs, and a multiple search engine workflow resulted in over 9,000 protein groups being identified. Photograph courtesy of Philip Loziuk. Copyright, 2015.

which performs in silico digestion of P. trichocarpa database proteins at arginine (R) and lysine (K) residues, fixed carbamidomethyl modification of cysteine residues, variable oxidation of methionine, variable deamidation of asparagine and glutamine, maximum of 2 missed cleavages, 5 ppm precursor tolerance, and 0.02 Da MS/MS tolerance. Finally, the search results (.dat files) for the nuclear proteins obtained from Mascot were imported into ProteoIQ v2.3.08 (Premier Biosoft, Palo Alto, CA) to attain a final peptide/protein identification list at a protein false discovery rate (FDR) of 1%. Search results (.dat files) from nuclear protein peptides were also imported into Skyline by building a spectral library where peptides are filtered at a 0.9 peptide probability. From this, Skyline selects the best representative library spectra for each peptide.52 In parallel, raw files were also imported and searched directly using Proteome Discoverer v1.4 in conjunction with Sequest, Sequest HT, and Mascot to maximize proteome coverage. Search parameters were the same with the exception that the database used was a target-only database. A precursor ion area detector node was implemented using default parameters to perform label-free quantification of proteins using Sequest HT. Peak area for each protein group was calculated by averaging the 3 highest responding peptides/groups across StageTip fractions. All Proteome Discoverer data were filtered using Percolator at a 1% peptide FDR, and strict maximum parsimony principle was applied for protein grouping, which ensured each protein group was distinguished by a unique peptide. Data from Proteome Discoverer and Mascot in conjunction with Skyline were used to perform surrogate peptide selection.36 Target proteins were examined individually and determined to be identified only if a unique peptide belonging to the protein was identified. All other proteins that could not be uniquely identified and shared at least one peptide sequence were labeled as a protein group. Finally, all raw data were also processed in the same manner using Sequest to search and annotate the results using the P. trichocarpa Uniprot reference database containing 44,350 sequences. This processed data was then used to perform cellular compartmentalization analysis using STRAP v1.5 software for rapid analysis of protein annotation.53

Targeted LC−MS/MS analysis was performed on a TSQ vantage triple quadrupole mass spectrometer (Thermo Scientific, San Jose, CA) equipped with an Ekspert 415 nanoLC system (AB SCIEX, Framingham, MA) using a vented column configuration. Trap and analytical columns were created using 100 μm × 5 cm IntegraFrit and 75 μm × 20 cm PicoFrit columns (New Objective, Woburn, MA), respectively, and were self-packed with Kinetex C18 2.6 μm particles (Phenomenex, Torrance, CA). Samples were reconstituted in 250 μL of 0.001% zwittergent 3-16, and a 10 μL injection was performed in triplicate analyses using a 10−50% B 20 min gradient at a flow rate of 400 nL/min. For quantification of lignin, proteins samples were brought up in 500 μL of 0.001% zwittergent 3-16, and a 5 μL injection was performed in duplicate. The following parameters were used for SRM analysis: capillary temperature of 250 °C, 1.6 kV spray voltage, 0.7 full width at half-maximum (fwhm) peak width, cycle time of 1.5 s, collision gas pressure of 1.5 mTorr, and a chromatography filter peak width of 30 s. Peptides were fragmented using collision energies optimized in Skyline v2.5.6079.50,51 Transitions were monitored using a scheduled SRM method in which a 2 min scan window was used to monitor each peptide. Raw files were imported into Skyline for automatic peak detection and integration. All chromatograms were inspected for any obvious coeluting contaminants. Data were then exported to Excel 2013 (Microsoft, Redmond, WA) to calculate relative abundance (RA) and to perform further quantification as described previously.36,37 On the basis of injections of internal standards at optimized collision energies in varying amounts, a transition RA tolerance of 20% was used to assess transition purity for cellulose proteins. Lignin protein quantities were assessed using previous RA calculation methods.41 Bioinformatics, Global and Targeted Data Analysis

Raw files obtained from analysis of nuclear protein were converted to “.mgf” using Proteome Discoverer (Thermo Scientific, San Jose, CA). The resulting “.mgf” files were searched using Mascot version 1.0 against a concatenated target-reverse database JGI P. trichocarpa v2.2 (Joint Genome Institute, U.S. Department of Energy) containing 45,215 target sequences, including modified sequences recently cloned from P. trichocarpa (Supporting Information).15 The database was searched using the following parameters: trypsin as the enzyme, D

DOI: 10.1021/acs.jproteome.5b00233 J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research



RESULTS AND DISCUSSION

could benefit from a pooling of specific pH fractions (e.g., 4, 5, 6, 8) to optimize use of material and instrument time. Additionally, measures could be taken to optimize the StageTip fractionation workflow for increased peptide separation efficiency. Cellular compartmentalization of the P. trichocarpa Uniprot database containing 44,350 protein sequences and our nuclear protein data set containing 7,884 protein groups revealed a very similar distribution of proteins identified in our samples to that of the entire proteome (Figure S2). In comparison to a crude protein digest, the nuclear fractionated sample had a greater abundance of nuclear protein and fewer cytosolic proteins by percent. Moreover, if we examine proteins unique to the nuclear fraction, we see an even greater representation of nuclear proteins and fewer cytosolic proteins by percent. This suggests that we were successfully able to obtain a high quality data set that was representative of the proteome while also enriching for nuclear protein compared to a crude extract. Isolation of the nuclear proteome was not the goal of this experiment. Conditions used for nuclear protein isolation were designed to provide reduced complexity of our mixture to identify transcription factors as well as ensure adequate total protein recovery for downstream sample preparation and LC− MS analysis. Successful enrichment of nuclear protein allowed for the identification of many of the target proteins involved in secondary wall biosynthesis, including all 5 target cellulose synthase proteins, which were uniquely identified (Figure 3). Additionally, for the cellulose synthase protein CesA4, the identified peptides were able to uniquely confirm the presence of the CesA4 cDNA sequence as well as the alternate SNP form of CesA4 (Supporting Information). The CesA18 cellulose synthase protein was also uniquely identified along with the amino acid sequence predicted by the complementary DNA sequence, confirming the presence of both the P. trichocarpa protein sequence predicted by the genome assembly as well as that predicted by the cDNA sequence. Candidate surrogates were identified for both glucomannan proteins CslA1 and A2 as well as 10 of the 17 target xylan proteins groups. Additionally, 11 of the 16 primary xylem-specific target transcription factors involved in secondary wall biosynthesis were identified. Furthermore, 31 of the 153 other xylem-specific transcription factor targets involved in secondary wall biosynthesis were also identified, including KNAT7, which is a second level negative regulator of secondary wall biosynthesis (Supporting Information).54 One transcription factor, SND2/3-A2, was identified uniquely in both its non-SNP and SNP forms. Several of the primary target transcription factors were also uniquely identified from their intron-retained forms, including SNDA1, SND-A2, SND2/3-A2, SND2/3-A2-SNP, SND2/3-L1, VND6-A1/A2, VND6-B1, and VND6-C1/C2 (Supporting Information). This provides the potential for unique quantification of full length transcription factors, distinguishable from their isoforms.55,56 Ultimately, identifying unique peptides that distinguished between isoforms would also allow quantification of particular splice variants, providing another level of quantitative regulatory information because the form in which these transcription factors exists may have a strong effect on their binding/effector capabilities. Although many of the target proteins identified in these data are known to be activated and/or repressed by MYB transcription factors in both Arabidopsis and P. trichocarpa,9,30 none of the 26 primary target MYB transcription factors in our

Downloaded by UNIV OF CAMBRIDGE on September 2, 2015 | http://pubs.acs.org Publication Date (Web): September 1, 2015 | doi: 10.1021/acs.jproteome.5b00233

Shotgun Proteomic Analysis of Nuclear Stem Differentiating Xylem

Implementing an integrated strategy for shotgun proteomic analysis of nuclear xylem protein to target transcription factors involved in secondary wall biosynthesis (Figure 1) resulted in the identification of 9,146 protein groups (1% FDR) based on 42,723 unique peptide sequences. Initial examination of LC− MS/MS data was performed through the data quality assessment tool RawMeat 2.0 downloaded from http:// vastscientific.com/rawmeat (Figure S1). A consistent number of MS and MS/MS scans were performed across fractions, and the Top-N, which indicates how frequently a particular number of MS/MS are performed following a survey mass spectrum, revealed a similar distribution across fractions. Additionally, the lack of frequent top 12 occurrences also confirmed that the proteome being measured was not under sampled and that the gradient was sufficient to provide adequate sequencing time for the instrument. Further assessment of StageTip fractions revealed that although a similar number of total peptides were identified from each fraction, the proportion of unique peptides identified across most fractions was low (Figure 2A).

Figure 2. (A) The total number of peptides identified for triplicate analysis of each pH fraction shows that a similar number of peptides were identified across pH fractions. The percent of unique peptides identified from each fraction was ∼10% across pH fractions 4−8. Fractions pH 3 and pH 11 had the highest percentage of unique peptides identified with ∼30% of the peptides identified being unique. (B) The isoelectric point for peptides identified is plotted across pH fractions, showing that the average pI for peptides was approximately the same across fractions with a relatively wide range in the pI of peptides identified.

The StageTip pH 11 and 3 fractions had the greatest proportion of unique peptides (∼30% each, 60% total). The means and distributions of isoelectric points for peptides identified across these fractions were nearly identical (Figure 2B). Although our fractionation methods were separating our peptides, we were inadvertently diluting the same peptides across several fractions. Future discovery-based experiments E

DOI: 10.1021/acs.jproteome.5b00233 J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Downloaded by UNIV OF CAMBRIDGE on September 2, 2015 | http://pubs.acs.org Publication Date (Web): September 1, 2015 | doi: 10.1021/acs.jproteome.5b00233

Journal of Proteome Research

Figure 3. A list of the cellulosic, hemicellulosic target proteins as well as xylem-specific transcription factors regulating secondary wall biosynthesis identified during shotgun analysis of nuclear fractionated xylem protein.

database were uniquely identified. MYB90/MYB167 were identified by a shared peptide that also matched another MYB transcription factor according to the Plant Transcription Factor Database (PTFDB version 2.0).57 Six MYB TFs were uniquely identified from the PTFDB (Supporting Information), which suggests that these particular MYBs may be xylem specific and involved in the regulation of wood formation. The lack of primary target MYBs from this data set is not an entirely surprising result. Although a few MYB TFs are comparable in abundance to NAC TFs at the transcript level, many of the MYB TFs are some of the lowest in abundance of all target proteins (Chiang, unpublished). The observed absence of MYB TFs could thus be a result of being near the limit of detection or due to a lack of high responding, unique tryptic peptides corresponding to these transcription factors. It is also possible there are underlying post-transcriptional/translational mechanisms as MYB TFs are known to be controlled at both the transcript and protein level.58

comparable in abundance. We were unable to assess which specific cellulose/hemicellulose proteins were most abundant due to variation in the data. Cellulose- and xylan-associated proteins are located in the Golgi and plasma membrane, respectively, both of which were likely enriched during subcellular fractionation. Additionally, these proteins are known to be the most abundant of cellulosic/hemicellulosic proteins at the transcript level.15 Glucomannan synthase proteins Csl1 and Csl2 were found to be less abundant than cellulose/xylan enzymes. As expected, transcription factors were typically the least abundant of the target proteins. SND NAC transcription factors were overall the most abundant transcription factors. Secondary activators SND2/3-A2 and SND2/3-L-1 were the most abundant of the NAC family. Moreover, SND NAC transcription factors were found to be significantly greater in abundance than VND NAC transcription factors, which is consistent with observed differences in transcript abundance.15

Label-Free Quantification of Cellulosic Proteins and Associated Transcription Factors

Absolute Quantification of Cellulosic Proteins

Absolute quantification using PC-IDMS was performed on a select set of 14 proteins involved in cellulose, xylan, and glucomannan biosynthesis (Figure 5A). All proteins were quantified across 3 complete analytical replicates with a CV < 10%, demonstrating a robust and reproducible assay (Figure 5B). Of the target proteins, those involved in cellulose biosynthesis (CesA18, CesA4, and CesA7) were the most abundant proteins, consistent with cellulose being the most abundant polysaccharide. These three proteins exclusively are necessary for cellulose crystallization and glucan polymerization.6,59 Although lower in abundance, CesA17 and CesA8 were also reproducibly quantifiable. Proteins involved in xylan

Using raw precursor peak area from replicate analysis of 6 StageTip fractions resulted in quantification over 8,000 protein groups spanning a dynamic range of over 7 orders of magnitude. Average peak areas for the top 3 highest responding peptides/groups along with their corresponding relative standard deviation (RSD) across 6 StageTip fractions were calculated for targeted cellulosic proteins identified from shotgun proteomic analysis of nuclear SDX (Figure 4). Quantified target proteins spanned a range of 5 orders of magnitude; the most abundant of these were xylan glycosyltranferase (GT) and cellulose synthase proteins, which were F

DOI: 10.1021/acs.jproteome.5b00233 J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Downloaded by UNIV OF CAMBRIDGE on September 2, 2015 | http://pubs.acs.org Publication Date (Web): September 1, 2015 | doi: 10.1021/acs.jproteome.5b00233

Journal of Proteome Research

Figure 4. Label-free quantification of all protein groups revealed a total quantifiable dynamic range of 8 orders of magnitude with target cellulosic/ hemicellulosic protein groups ranging across 5 orders of magnitude in abundance. As expected, all of the xylan and cellulose proteins were highest in abundance with the glucomannan proteins and transcription factors being lower in abundance. NAC transcription factors were largely the most abundant of the transcription factors.

quantified.36 Excluding the unusually abundant lignin protein COMT2, lignin protein quantity in the same samples ranged from 1.29 to 119.11 fmol/μg of protein with a median abundance of 10.39 fmol/μg (Figure 5C). Lignin protein abundance is roughly 10-fold greater on average than cellulose/ hemicellulose proteins. This is worth noting because the ratio of cellulose-to-lignin products in P. trichocarpa is about 3:1.3 Because lignin and cellulose proteins are similar in transcript abundance,15 this suggests differences at the protein level are due to either post-translational/transcriptional mechanisms or analytical bias. If cellulose proteins are indeed lower in abundance than lignin proteins, but their products are more abundant, this could suggest differences in catalytic efficiencies of these enzymes or regulation of flux through their pathway and warrant further investigation. Selective bias during extraction of membrane proteins could cause an underestimate of the true quantity of cellulose/ hemicellulose proteins in xylem tissue. The more abundant lignin proteins are largely soluble cytosolic proteins (excluding

biosynthesis largely ranked second in abundance, consistent with the fact that it is the most abundant hemicellulose in P. trichocarpa.3 Furthermore, proteins involved in glucomannan biosynthesis (a less abundant hemicellulose) were lower in abundance than most cellulose and xylan proteins, which is also consistent. In addition to the absolute quantities, the biological variation ranged from 7% up to 30%, which was larger than the analytical variation. The analytical variation was lower than that of the biological variation for all but one target protein, IRX141 which also appeared to have a low variation between biological replicates (