Taking Advantage of Nonspecific Trypsin Cleavages for the Identification of Seed Storage Proteins in Cereals Kjell Sergeant,*,†,# Carla Pinheiro,‡,# Jean-Franc¸ois Hausman,† Caˆndido Pinto Ricardo,‡ and Jenny Renaut† Centre de Recherche Public-Gabriel Lippmann, Department ‘Environment and Agro-biotechnologies’ (EVA), 41, rue du Brill, 4422 Belvaux, Luxembourg, and Instituto de Tecnologia Quı´mica e Biolo´gica, Universidade Nova de Lisboa, Av. da Repu ´ blica-EAN, 2780-157 Oeiras, Portugal Received December 19, 2008
Abstract: The lack of basic amino acids in seed storage proteins has resulted in the proposal to use chymotrypsin in their study. A comparative study of trypsin and chymotrypsin digestion initially confirmed this preference; however, reanalysis of the trypsin data set defining the specificity as ‘semitrypsin’ provided enough extra data to bridge the gap between both proteases. Rationale as to why numerous semitryptic peptides are observed in the study of these proteins is provided. Keywords: seed storage proteins • trypsin • chymotrypsin • semitrypsin • maize
Introduction Apart from their obvious role in plant propagation, seeds are important to humans for three principal reasons: as food (either consumed directly or after processing), as feed for livestock, and as an industrial raw material. Consequently, seeds provide more than half of the world’s intake of dietary proteins and carbohydrates with a corresponding impact on human society (e.g., economic, nutritional, cultural). Although seeds from other groups of plants can also serve as stock-food (e.g., legumes) or as the basis for industrial processing (e.g., oil seeds), a majority of seeds used by modern societies are from the grass family.1 Given the importance of seeds, the study of seed storage proteins (SSPs) has attracted considerable research interest. A system for classifying seed proteins, based on their sequential extraction in a series of solvents, is now well-established. Typically, this defines four protein groups (albumins, globulins, prolamins and glutelins) known as Osborne fractions. The seeds of most plants contain proteins from all four Osborne fractions, but depending on the species, some are more abundant than others. Although Osborne classification remains a valuable tool in seed research, it does have some drawbacks (e.g., formation of insoluble aggregates, proteins being present in more than one fraction), making classification sometimes ambiguous.2 Therefore, structural and sequence similarities as well as mobility of the proteins during gel electrophoresis have been * To whom correspondence should be addressed. Tel.: 00352 470261 458. Fax: 00352 470264. E-mail:
[email protected]. # These authors contributed equally to this work. † Centre de Recherche Public-Gabriel Lippmann. ‡ Universidade Nova de Lisboa.
3182 Journal of Proteome Research 2009, 8, 3182–3190 Published on Web 04/21/2009
included in more recent classification schemes. The actual contemporary definition of prolamins refers to proteins that are water-insoluble or insoluble in saline aqueous solutions in their native state (including Osborne prolamins and Osborne glutelins). This definition reflects a biologically and physically significant property important for the scientific classification of the group but also for their successful storage.3,4 In recent years, proteomics has emerged as a key approach in the search for answers to biological problems. Nonetheless, the use of proteomics to study cereal SSPs is hampered by properties peculiar to these proteins. Generally, they are characterized by a repetition of stretches of sequence, resulting in significant sequence redundancy.5 The analysis of SSP is furthermore hindered by the proteins’ physical properties and, although differential extraction can partially eliminate this, the ambiguity of the extraction sometimes precludes simple and straightforward characterization. Furthermore, publicly available sequence databases contain numerous highly redundant entries of this group of proteins that are often annotated with trivial names, names based on mobility, or as unknown. The complexity of the proteome and the need to attain high throughput in proteomic studies have spurred development of a broad range of techniques for the identification and characterization of proteins.6 However, with the exception of top-down techniques,7,8 one step, enzymatic or chemical cleavage of proteins, is common to nearly all techniques. This step (usually enzymatic cleavage with trypsin9) is maintained due to the advantage conferred by uniformity and predictability of physical and chemical properties of peptides versus proteins. The preference for trypsin stems from its stability under a wide range of conditions, its high activity and high substrate specificity. Cleavage C-terminal to arginine and lysine by trypsin results in uniform peptides that (with the exception of the C-terminal one) contain a basic group as C-terminal amino acid. Although a model that accurately predicts the relative intensity of fragment ions is currently lacking,10,11 these C-terminal basic residues not only improve ionization in MS-analysis but also lead to more uniform fragment-ion series in MS/MS-analysis. Furthermore, through the defined mass range wherein tryptic peptides are found and the predictability of their chromatographic behavior, the use of trypsin has made possible the development of fully automated analytical platforms. The use of alternative cleavage methods has accordingly been limited to the study of specific groups of proteins or to the analysis of specific protein properties. A recently published 10.1021/pr801093f CCC: $40.75
2009 American Chemical Society
technical notes
Identification of Seed Storage Proteins in Cereals method for C-terminal sequence determination in the study of proteolytic processing events was based on digestion of proteins with cyanogen bromide.12 Because of solubility problems and the limited occurrence of basic residues in transmembrane helices, low-specificity proteases, such as elastase and pepsin, have been used to identify and characterize membrane proteins.13 Furthermore, sequential digestions using chemical and/or enzymatic cleavage protocols have been proposed for the complete characterization of protein primary structure.14-17 The use of chymotrypsin, which cleaves predominantly C-terminal to leucine, phenylalanine, tryptophan and tyrosine disfavoring peptide bonds N-terminal to proline, in the study of SSP has been proposed.18,19 This was due to a particularly low content of basic amino acids of the cereal SSP like those of wheat (Triticum aestivum L.) and maize (Zea mays L.). Although representing a minor fraction of the cereal endosperm (typically comprising 70% starch and 8-15% protein on a dry weight basis), most of the technological properties of cereal-based products depend on their SSP characteristics.20 In wheat, prolamins are the major components of gluten, a cohesive protein-based mass that confers viscoelastic and cohesive properties allowing wheat to be processed in breads, pastas, and so forth. Gluten proteins are further classified as gliadins (soluble in alcohol) and glutenins (insoluble in alcohol). The latter are largely responsible for gluten viscoelasticity, while gliadins act to plasticize gluten mass.4 Although the unique properties of wheat-based bread have made it a primary food source for most Western countries, the number of consumers diagnosed with celiac disease or other allergic reactions/ intolerances to gluten (>1:300) 21,22 and the predicted increase by a factor of 10 in the next years have created a growing market for gluten-free cereal-based products.23 Celiac disease, the most common food sensitive enteropathy, is an autoimmune condition of the small intestine that is induced in susceptible individuals by repeated exposure to dietary gluten, that is, SSP of wheat, barley or rye.24,25 Therefore, understanding the protein content and composition of gluten-free flours, based on, among other sources, rice, sorghum or maize, is receiving increasing attention in nutrition and food processing.1,22,26 In Portugal, maize flour is traditionally used to produce a type of cornbread (“broa”). However, the texture and taste of broa depends on a range of traits that characterize some of the traditional Portuguese maize landraces and are not found in the available commercial hybrid varieties. This might explain why these traditional maize landraces have not, in these regions, been totally replaced by high-yield, commercial varieties.27 Therefore, a multiannual effort to characterize the seed storage proteome of two of these traditional landraces, contrasting in their bread making ability, was recently initiated. As an introductory experiment to this study, the objective of the current study is the optimization of the mass spectrometric identification of maize SSP. To this end, a theoretical and practical comparison between the use of trypsin and chymotrypsin for protein identification of maize kernel proteins was performed. All database entries corresponding to maize storage proteins were digested in silico and the mass distribution of the resulting peptides was mapped. Subsequently, the preference for the use of chymotrypsin over trypsin was confirmed through the identification of spots picked from 2D-gels. The prolamins of mature maize seeds were separated into alcoholsoluble prolamins (hereafter designated as Osborne prolamin fraction) and alcohol-insoluble prolamins (thereafter desig-
nated as Osborne glutelin fraction). Because the proteins in the two other Osborne fractions (globulins and albumins) do not have this lack of basic amino acids and are readily identified by standard trypsin digestions, these fractions were not used for the comparative study. Protein digestion with chymotrypsin not only increased the identification rate, but higher sequence coverage also resulted in more unique identifications, a particular difficulty with this group of highly homologous proteins. However, manual interpretation of fragmentation spectra that were not matched after trypsin digestion led to the observation that these MS-spectra contain numerous nonspecific peptides. Reanalysis of the trypsin-data set revealed that changing the specificity setting during the database searches, allowing semitryptic peptides, increases the identification yield and the sequence coverage obtained.
Material and Methods Mass Distribution of Tryptic versus Chymotryptic Peptides. All protein sequences corresponding to maize zeins, generally accepted name for maize SSP, were retrieved from the NCBI-protein database and theoretically digested with either enzyme. To include zeins annotated as hypothetical, unknown or with a trivial name, the retrieved proteins were submitted in blast searches and those database entries that showed a high homology and originated from maize were also taken into account. Completely redundant sequences were eliminated by submitting this list of 210 protein sequences to DivergentSet (bmf.colorado.edu/divergentset).28 This program groups identical sequences and resulted in 123 noncompletely redundant sequences that were used for simulated digestion. The more stringent rules for chymotrypsin cleavage were used: scission of peptide bonds C-terminal to Leu, Phe, Tyr and Trp (Met and Ala excluded) if not N-terminal to Pro. The same cleavage preferences are defined in the default settings of the Mascot-software that was used throughout this study. MSDigest was used to calculate the theoretical masses of the cleavage products (prospector.ucsf.edu/cgi-bin/ msform.cgi?form)msdigest). Carboxyamidomethyl-cysteine was used as fixed modification and a single missed cleavage was tolerated. Very short peptides (m/z 5000) precludes the discrimination between different isoforms/variants. This is illustrated in Supplementary Table 2. In 10 of the spots (spots 51, 52, 55 and 57-63), a significant score was obtained for this peptide, without any other peptide being matched. Likewise, identical peptides are observed from different proteins after chymotrypsin digestion; however, since several peptides can be analyzed for each protein, this bottleneck for unique identifications is much less pronounced. On the contrary, the number of very small peptides (