Anal. Chem. 2009, 81, 3738–3745
Two Dimensional Mass Mapping as a General Method of Data Representation in Comprehensive Analysis of Complex Molecular Mixtures Konstantin A. Artemenko,† Alexander R. Zubarev,† Tatiana Yu Samgina,‡ Albert T. Lebedev,‡ Mikhail M. Savitski,† and Roman A. Zubarev*,§ Division of Molecular Biometry, Institute for Cell and Molecular Biology, Uppsala University, Uppsala, Sweden, Department of Organic Chemistry, Moscow State University, Moscow, Russia, and Department of Medical Biochemistry and Biophysics, Karolinska Institute, Stockholm, Sweden A recent proteomics-grade (95%+ sequence reliability) high-throughput de novo sequencing method utilizes the benefits of high resolution, high mass accuracy, and the use of two complementary fragmentation techniques collision-activated dissociation (CAD) and electron capture dissociation (ECD). With this high-fidelity sequencing approach, hundreds of peptides can be sequenced de novo in a single LC-MS/MS experiment. The high productivity of the new analysis technique has revealed a new bottleneck which occurs in data representation. Here we suggest a new method of data analysis and visualization that presents a comprehensive picture of the peptide content including relative abundances and grouping into families. The 2D mass mapping consists of putting the molecular masses onto a two-dimensional bubble plot, with the relative monoisotopic mass defect and isotopic shift being the axes and with the bubble area proportional to the peptide abundance. Peptides belonging to the same family form a compact group on such a plot, so that the family identity can in many cases be determined from the molecular mass alone. The performance of the method is demonstrated on the high-throughput analysis of skin secretion from three frogs, Rana ridibunda, Rana arvalis, and Rana temporaria. Two dimensional mass maps simplify the task of global comparison between the species and make obvious the similarities and differences in the peptide contents that are obscure in traditional data presentation methods. Even biological activity of the peptide can sometimes be inferred from its position on the plot. Two dimensional mass mapping is a general method applicable to any complex mixture, peptide and nonpeptide alike. Mass spectrometry (MS) has become a common method for identification of protein and/or peptide content of biological * Corresponding author. Roman A. Zubarev, Division of Molecular Biometry, Institute for Medical Biochemistry and Biophysics, Karolinska Institute, S-17177 Stockholm, Sweden. E-mail:
[email protected]. Phone/fax: +46 18 471 7209. † Uppsala University. ‡ Moscow State University. § Karolinska Institute.
3738
Analytical Chemistry, Vol. 81, No. 10, May 15, 2009
samples.1-3 The most popular methodology involves preparation of a cocktail of peptides via enzymatic degradation of proteins with subsequent HPLC-MS/MS analysis that yields molecular and specific fragment masses. These masses and ion abundances are then submitted to database search resulting in identification of peptide sequences and through them of the proteins present in the sample. Such an approach is known as the “bottom-up” or “shotgun” analysis.4,5 It is clear that only peptides and proteins whose sequences are stored in the database can be analyzed in this manner. If one deals with previously unknown sequences or sequences for some reason omitted in the database, the above method fails. In such a case, de novo sequencing of the peptides is required. Mass spectrometry has been claiming the ability to sequence peptides de novo for a long time,6 but only recently a proteomicsgrade (reliability 95%+) high-throughput de novo sequencing MSbased approach has been created.7 This approach utilizes the benefits of high resolution and high mass accuracy of Fourier transform mass spectrometry (FTMS) and uses for each peptide ion two complementary fragmentation techniques, collisionactivated dissociation (CAD) and electron capture dissociation (ECD). With the above high-fidelity method, hundreds of peptides can be sequenced de novo in a single LC-MS/MS experiment.7 One of the areas where de novo sequencing is required is the study of mixtures of nontryptic endogenous peptides. For instance, peptides found on frog skin are very variable depending upon the species, terrain, and other factors that are not yet fully understood.8,9 These peptides demonstrate a wide variety of biological activity from antimicrobial, antitumor, and fungicide activities to neuro(1) Domon, B.; Aebersold, R. Science 2006, 312, 212–217. (2) Kinter, M.; Sherman, E. Protein Sequencing and Identification Using Tandem Mass Spectrometry; John Wiley & Sons: New York, 2000. (3) McHugh, L.; Arthur, J. W. PLoS Comput. Biol. 2008, 4, e12. (4) Yates, J. R.; McCormack, A. L.; Eng, J. Anal. Chem. 1996, 68, 534A–540A. (5) Wu, C. C.; MacCoss, M. J. Curr. Opin. Mol. Ther. 2002, 4, 242–250. (6) Biemann, K. Annu. Rev. Biochem. 1963, 32, 755–780. (7) Savitski, M. M.; Nielsen, M. L.; Kjeldsen, F.; Zubarev, R. A. J. Proteome Res. 2005, 4, 2348–2354. (8) Simmaco, M.; Mignogna, G.; Barra, D. Biopolymers (Peptide Science) 1998, 47, 435–450. (9) Conlon, J. M.; Al-Ghafari, N.; Coquet, L.; Leprince, J.; Jouenne, T.; Vaudry, H.; Davidson, C. Peptides 2006, 27, 1305–1312. 10.1021/ac802532j CCC: $40.75 2009 American Chemical Society Published on Web 04/21/2009
logical and analgesic activities.10,11 Several hundred frog skin peptides have already been sequenced using Edman degradation,12 cDNA cloning,13 and MS/MS,14,15 and many of them are listed in peptide databases.16,17 However, many more new compounds are still waiting to be found, both in previously unstudied species as well as in species characterized earlier with lowthroughput techniques. With the latter techniques, the most laborious part was the sequence determination, and thus the analysis of a few dozens of new peptide sequences used to evolve into a research project lasting several months. With the new highthroughput sequencing approaches, peptide sequences are produced de novo at a rate of a few dozens per hour and the bottleneck is shifted elsewhere in the workflow procedure. Rather surprisingly, the new bottleneck position appears in data presentation. Previously, the characterized peptides would have normally been presented in the form of a table comprising sequences.8-17 If more than a few sequences were present, they would be grouped based on some similarity feature, normally peptide length and/or sequence motifs, the latter being established rather subjectively. Often, a chromatogram or MALDI fingerprint mass spectrum would be present as well.18 Comparison between two populations within the same species can be performed using intensities of the chromatographic peaks with similar retention times, with the underlying assumption that different populations produce the same peptides but with different abundances.19 Such an approach to peptide mixture characterization works well as long as the same species are concerned, but it is absolutely inadequate for studying interspecific variability. Indeed, different species can produce vastly different peptides, and even a combination of the chromatographic retention time and molecular mass measured with the accuracy of ±0.1 Da is insufficient for peptide identification.20 Additional complication is that frog peptide sequences are not random but often represent variations of several common themes. Such peptide families as bradykinins, brevinins, esculentins, and temporins are well-known and have been found in different Ranid frog species.21 However, each family can in principle include an astronomical number of (10) Apponyi, M. A.; Pukala, T. L.; Brinkworth, C. S.; Maselli, V. M.; Bowie, J. H.; Tyler, M. J.; Booker, G. W.; Wallace, J. C.; Carver, J. A.; Separovic, F.; Doyle, J.; Llewellyn, L. E. Peptides 2004, 25, 1035–1054. (11) Pukala, T. L.; Bowie, J. H.; Maselli, V. M.; Musgrave, I. F.; Tyler, M. J. Nat. Prod. Rep. 2006, 23, 368–393. (12) Sai, K. P.; Jagannadham, M. V.; Vairamani, M.; Raju, N. P.; Devi, A. S.; Nagaraj, R.; Sitaram, N. J. Biol. Chem. 2001, 276, 2701–2707. (13) Chen, T.; Scott, C.; Tang, L.; Zhoua, M.; Shawa, C. Regul. Pept. 2005, 128, 75–83. (14) Samgina, T. Y.; Artemenko, K. A.; Gorshkov, V. A.; Lebedev, A. T.; Nielsen, M. L.; Savitski, M. M.; Zubarev, R. A. Eur. J. Mass Spectrom. 2007, 13, 155–163. (15) Rozek, T.; Bowie, J. H.; Wallace, J. C.; Tyler, M. L. Rapid Commun. Mass Spectrom. 2000, 14, 2002–2011. (16) Fa¨lth, M.; Sko ¨ld, K.; Norrman, M.; Svensson, M.; Fenyo¨, D.; Andren, P. E. Mol. Cell. Proteomics 2006, 5, 998–1005. (17) Lu, P.; Szafron, D.; Greiner, R.; Wishart, D. S.; Fyshe, A.; Pearcy, B.; Poulin, B.; Eisner, R.; Ngo, D.; Lamb, N. Nucleic Acids Res. 2005, 33, D147–D153. (18) Conlon, J. M.; Kolodziejek, J.; Nowotny, N. Biochim. Biophys. Acta 2004, 1696, 1–14. (19) Samgina, T. Y.; Artemenko, K. A.; Zubarev, R. A.; Lebedev, A. T. Comparative MS Study of Peptide Profiles of Frogs Rana ridibunda from Different Parts of Former USSR. In 17th International Mass Spectrometry Conference, Book of Abstracts, Prague, Czech Republic, 2006; p 242. (20) Masselon, C. D.; Kieffer-Jaquinod, S.; Brugie`re, S.; Dupierris, V.; Garin, J. Rapid Commun. Mass Spectrom. 2008, 22, 986–992. (21) Vanhoye, D.; Bruston, F.; Nicolas, P.; Amiche, M. Eur. J. Biochem. 2003, 270, 2068–2081.
related peptides, and thus a priori knowledge of the possibility of presence of a known peptide family does not make the task of de novo sequencing significantly easier. On the contrary, the presence of peptide families is a complication in the analysis, as besides the sequence determination each peptide needs to be attributed to a specific family. For interspecies study, each peptide family needs to be quantified, and for each species the data set containing identities and abundances of several dozens of peptides has to be presented in a clear and comprehensive manner suitable for comparison. Conventional presentation methods encounter great difficulties in performing this task. Here we present a new method of data analysis and visualization that is both comprehensive and clear and suitable for quantitative comparison of complex peptide mixtures. The method affords good analytical resolution, so that contribution of each individual peptide is easily discerneable. Grouping peptides in families is achieved in a natural way, by exploiting the intrinsic similarity between the elemental compositions of related peptides. 2D Mass Mapping. In the mass spectrometry domain, elemental compositions manifest themselves in two measurable parameters, monoisotopic mass and average mass. These parameters are linked together and to the elemental composition by the masses and abundances of the isotopes. Derivatives of these two mass parameters, the monoisotopic mass defect (the difference between the isotopic and nominal, i.e., integer mass22) and the isotopic shift (the difference between the average (chemical) mass and the monoisotopic mass23), normalized by the nominal or monoisotopic mass, are nonadditive and invariant in respect to multiplication. In other words, these derivative mass quantities are the same for a molecule as for its dimer or trimer, while ordinary additive parameters such as monoisotopic and average masses would double and triple, respectively. As will be shown below, this property has an advantage in grouping related peptides belonging to the same family. We found that, after mapping on a two-dimensional plot created by these two mass derivatives, masses of a peptide group produce a nebula of closely localized spots, in many cases clearly separated from other peptide families. Only a change in the relative elemental composition shifts the spot. The direction of the shift is specific to the chemical nature of the change, and in some cases the modification can be correctly determined from the shift direction and its value. Since both coordinates can be quantitatively measured by mass spectrometry of an intact molecule, peptide masses can be mapped even without complete de novo sequencing. Therefore, the mapping allows one to tentatively attribute an unknown peptide to a peptide family. To add quantitative dimension, each spot is expanded into a bubble with the area proportional to the peptide abundance. We applied this new visualization technique to the comparative analysis of the peptide content in crude skin secretion of three frogs, Rana ridibunda, Rana arvalis, and Rana temporaria. These frogs were chosen because they represent two broad classes of green and brown frogs spread worldwide. They belong to the same genus Rana and therefore quite closely related to each other. On the other hand, they are separate species with different habitats and, consequently, different peptide profiles of skin secretions reflecting the differences in the immune systems.10 (22) Pomerantz, S. C.; McCloskey, J. A. J. Mass Spectrom. 1987, 22, 251–253. (23) Zubarev, R. A. Int. J. Mass Spectrom. Ion Processes 1991, 107, 17–27.
Analytical Chemistry, Vol. 81, No. 10, May 15, 2009
3739
These features make the comparison of the above frogs representative for interspecies analysis. MATERIALS AND METHODS Chemicals. All organic solvents were HPLC grade (Merck, Germany) while Milli-Q water was used in all experiments. Hydrogen peroxide (30% w/w in water), ammonium bicarbonate, dithiotreitol, and iodoacetamide were from Sigma (g99% purity). Skin Peptides Sampling. One individual of each frog species was caught some 100 km away from Moscow (Russia) and maintained in captivity. Skin glands were electrically stimulated according to the described procedure.24 Briefly, moisturized with deionized water, skin of the animal was treated with a bipolar platinum electrode connected to the laboratory electrostimulator ESL-1 (Kaunas Research Institute of Radiometrical Engineering, Lithuania). The pulse parameters were as follows: voltage, 10 V; pulse duration, 5 ms; pulse frequency, 50 Hz. The skin secretion was washed out with a small amount (up to 25 mL) of deionized water and then immediately diluted with an equal volume of methanol. The mixture was then centrifuged for 15 min at 500g, filtered through Millex HV membrane (0.45 µm), concentrated at 35 °C on a rotary evaporator to the volume of ∼1 mL, and then lyophilized. Dried peptide-containing substrate was dissolved in 200 µL of water and used as a stock solution. Disulfide Bond Treatment. Disulfide links present a large problem for de novo sequencing, as the peptides bonds inside the internal loop are resistant to fragmentation. Two complementary methods were used to permanently open disulfide links potentially presented in secreted peptides: carboxamidomethylation, as the most common procedure in proteomics, and oxidation. For carboxamidomethylation, 5 µL of peptides stock solution, 45 µL of 0.1 M ammonium bicarbonate buffer, and 10 µL of 0.1 M dithiotreitol (DTT) were mixed and incubated at 56 °C for 1 h. The resulting solution was cooled to room temperature, and then 10 µL of 0.5 M iodoacetamide was added. The mixture was incubated at room temperature for 30 min in the dark and used directly for MS analysis. For oxidation, performic acid was prepared by mixing of 19 parts of formic acid and 1 part of 30% hydrogen peroxide solution followed by 1 h incubation at room temperature. A total of 5 µL of peptide stock solution was then mixed with 150 µL of performic acid and incubated for 1 h at 5 °C. Finally, triple volume excess of water was added, and the resulting solution was lyophilized. Dried samples were dissolved in 30 µL for further analysis. Mass Spectrometry. Each peptide mixture was analyzed three times: with disulfide bonds intact, after carboxamidomethylation, and after oxidation. The results were summed together. If the same peptide was detected more than once, the maximum abundance was given in the output. All experiments were performed on a 7 T hybrid LTQ FT mass spectrometer (ThermoFisher Scientific, Bremen, Germany) equipped with a nanoESI ion source (Proxeon Biosystems, Odense, Denmark). Highperformance liquid chromatography used online with the mass spectrometer consisted of a solvent degasser, a nanoflow pump, and a thermostatted microautosampler (Agilent 1100 nanoflow system). A 15 cm fused silica emitter (75 µm inner diameter, 375 (24) Tyler, M. J.; Stone, D. J. M.; Bowie, J. H. J. Pharm. Toxicol. Methods 1992, 28, 199–200.
3740
Analytical Chemistry, Vol. 81, No. 10, May 15, 2009
µm outer diameter; Proxeon Biosystems) was used as an analytical column. The emitter was packed in-house with a methanol slurry of the reverse-phase, fully end-capped Reprosil-Pur C18-AQ 3 µm resin (Dr. Maisch GmbH, Ammerbuch-Entringen, Germany) using a pressurized “packing bomb” operated at 50-60 bar (Proxeon Biosystems). The samples were dissolved in acetonitrile/water/formic acid mixture (1:1:0.002 v/v) and injected into the reversed-phase nanocolumn. The analysis was performed using unattended data-dependent acquisition mode in which the mass spectrometer automatically switches between a high resolution survey mass spectrum in the FT cell and consecutive ECD and CAD of the most abundant detected peptides eluting at this moment from the nano-LC column. The resolving power 100 000 was used in the survey MS mode and 50 000 in the MS/MS modes. De Novo Sequencing and Quantification. HPLC-MS/MS analyses produced RAW-files that were processed by BioWorks software (ThermoFisher) into two sets of dta-files corresponding to CAD and ECD MS/MS spectra of all precursor ions subjected to fragmentation. Both sets were then submitted to the de novo sequencing software tool developed in house.7 In parallel, the two sets were merged into a single set of CAD-like data25 and submitted to the Mascot search engine (Matrix Science, U.K.). In both de novo and Mascot searches, the acceptable mass deviation for molecular masses was set to 10 ppm and for MS/ MS fragments to 0.02 Da. The resulting file of the de novo program was in the text format and contained the monoisotopic mass of the precursor ion, its scan number in the corresponding RAW file, and the deduced sequence. The Mascot output peptide list in the htm-format was then converted to the text format using an in-house written program (C++). The de novo or Mascot text files together with the original RAW file were then used to derive relative intensities and retention times of all sequenced peptides. This was done by yet another home-written program (C++) that searched in the RAW file for the peptide ion signals using their known m/z values and scan number and performed charge and isotope deconvolution as well as integration of the chromatographic peak. The output was a table comprising peptide sequence, its retention time determined at the apex of the chromatographic peak and the relative abundance that was derived from that peak. 2D Mass Mapping. The output tables containing sequences, retention times, and relative intensities were first exported to Excel (Microsoft Office 2007). The column containing sequences was converted by a home-written peptide calculator into elemental compositions, from which theoretical monoisotopic and average molecular masses were obtained. Mass derivatives were calculated as follows: normalized monoisotopic mass defect (NMD) was calculated as NMD ) 1000 × (MM - MN) ⁄ MM
(1)
where MM and MN are monoisotopic and nominal molecular masses, respectively. MN was calculated as MN ) INT(0.5 + (MM(1999.0 ⁄ 2000.0)))
(2)
(25) Nielsen, M. L.; Savitski, M. M.; Zubarev, R. A. Mol. Cell. Proteomics 2005, 6, 835–845.
where INT returns the integer part of the argument. The normalized isotopic shift (NIS) was calculated as NIS ) 1000 × (AM - MM) ⁄ MM
(3)
where AM is the average molecular mass. Peptide grouping into families was performed using the following criterion: the sequence of a new family member must be derived from any existing family member by unlimited extension/shortening of the sequence and/or by maximum two amino acids replacements. Peptide families were then plotted using Excel functions. The chart type “bubble” was used, different peptide families were selected as the parent series, the NMD column was used as “series X-values”, the NIS column as “series Y-values”, and the column comprised of intensities was used as “series bubble size”. Peptides with at least two cysteines that can form disulfide bonds are marked by a double borderline of the bubble. Coordinates of the center of gravity were calculated as a weighted average (normalization factor was the peptide abundance) of NMD and NIS. Experimentally, NMD and NIS positions were measured in the “survey” MS scans of the LC-MS runs. The scans were integrated over the duration of the chromatographic peak and chargedeconvolved Using Xtract function of the XCalibur software (ThermoFisher). RESULTS AND DISCUSSION Properties of 2D Mass Maps. A typical tryptic peptide has a mass of ∼1000 Da and contains 8-10 amino acid residues. Its elemental composition can be compared to that of averagine (Av), an artificial amino acid created in silico by Senko et al.26 Averagine’s elemental composition, C4.9384H7.7583N1.3577O1.4773S0.0417, reflects the average element content of residues found in native polypeptides. The size of averagine is adjusted in such a way that its mass, 111.1 Da, is close to the mass of a typical residue, and that 1 kDa of polypeptide’s mass corresponds to nine averagine residues. Thus an average tryptic peptide is equivalent to Av9 and has a composition C44.4456H71.8247N12.2193O14.2957S0.3753, which corresponds to NMD ) 0.49 and NIS ) 0.63. In Figure 1, a star marks the position of the Av9 peptide on the 2D mass map. The star position is a reference point relative to which deviations can be measured. A chemical substitution, however minor, will result in a shift relative to that reference point. The most common substitutions observed in bottom-up proteomics are deamidation and methylation that can occur in vivo, but in shotgun proteomics experiments often appear due to in vitro side reactions.27 Deamidation of Asn to Asp as well as Gln to Glu, amidated form of the C-terminus to the acidic form and other deamidation reactions, e.g., Arg citrullination, is equivalent to replacing NH2 by OH. This (- NH + O) substitution gives only +0.98 Da mass increment (0.1% of the total mass of Av9), but the shift on the 2D plot is quite significant, especially the negative shift along the NMD axis (Figure 1A). The reverse modification, amidation, is a similar shift but in the opposite direction. This shift is in fact larger than that from methylation, (26) Senko, M. W.; Beu, S. C.; McLafferty, F. W. J. Am. Soc. Mass Spectrom. 1995, 6, 229–233. (27) Nielsen, M. L.; Savitski, M. M.; Zubarev, R. A. Mol. Cell. Proteomics 2006, 5, 2384–2391.
Figure 1. 2D mass map of the Av9 peptide in the unmodified, modified (A), and substituted (B) forms.
+ CH2 addition, which is 14 times bigger than deamidation in terms of the mass increment. While both deamidation and methylation preferentially act in the NMD direction, another common modification, methionine oxidation (+ O), shifts in both directions. Double oxidation (+ 2O) expectedly shifts by approximately double the value of a single oxidation, which underlines the general property of the 2D mass map shifts of being additive in the first approximation. Not surprisingly, conversion of cysteine to cysteic acid (+ 3O) follows the trend. An already mentioned property of the shift to be determined by the relative change of the chemical composition is illustrated by the modest shift upon cysteine alkylation by iodoacetamide (+ CH2CONH, 57.02 Da mass increment). Of the biologically important modifications, few are as prominent as phosphorylation (+ HPO3, 79.9663 Da). It gives a large negative shift in both dimensions. Isobaric sulfation (+ SO3, 79.9568 Da) gives a distinctly different shift (Figure 1A), although the mass difference with phosphorylation is only 9.5 mDa. This example illustrates the complementarity of 2D mass mapping to the usual way of presentation of mass spectrometric information. The effect of amino acid substitution of one of the averagine residues in Av9 is investigated in Figure 1B. Amino acids Gly, Thr, Gln, Ala, and Pro produce the smallest shifts in the 2D map, while S-containing Met and Cys give the highest shifts. Interestingly, aliphatic Val and Ile/Leu give a similar shift as the strongly basic Lys and Arg, while basic His is rather part of the aromatic Analytical Chemistry, Vol. 81, No. 10, May 15, 2009
3741
group of residues comprising also Tyr, Phe, and Trp. Acidic Asp and Glu give a shift in the opposite direction to the strongly basic residues. Note the Gln/Lys pair: while these isobaric residues are only 36 mDa apart in terms of their masses (0.03%), on the 2D map they are far apart. The arrows in Figure 1B determine the “wind rose” of amino acid substitution. A strong “eastern” direction is aliphatic or basic, while a medium-to-strong “western” direction is acidic. A strong “north-west” is the sulfur-containing direction, while the aromatic “wind” is a weak “north-west”. A weak “south-western” direction is caused by polar residues. Peptide Sequencing. Mascot search of all three frog samples gave 89 above-threshold identifications. As expected, de novo sequencing produced many more identifications: 273 sequences altogether, of which 70 with no gaps, 160 with a gap of two residues long, 43 with the longest gap three residues long. The order of amino acids in a gap is unknown, while their identities are known, e.g., [A + T] ) (AT) or (TA). Previous evaluation of the de novo software revealed that sequences without gaps agree with their respective Mascot search identification in 96% of the cases, with one gap in 94% of the cases and with two gaps in 73%.7 Although the last figure is below the 95% validity level acceptable in proteomics, it is suitable for our purposes to illustrate the new data presentation approach. In that approach, peptide position on the 2D mass map is determined by its elemental composition and not the order of amino acids and even their identity. The usual problems of MS de novo sequencing, such as Ile/Leu differentiation and (Gly + Gly) vs Asn recognition do not concern us, because these substitutions are isomeric and thus do not change the elemental composition. As will be shown below, small deviations in elemental composition do not shift the position of the peptide too far away from its family, and thus family identity remains unaffected. Therefore we have included in this study sequences with up to four-residue gaps. There are also many peptides that remained unsequenced, either because their signal was too week or because they were modified. Again, for the purpose of this study this was not a handicap, because a sufficient number of peptides was sequenced to illustrate the new visualization method. Rana arvalis is one of the most widespread species of brown frogs on the European continent. Skin peptides of this species are less described in literature compared to both R. ridibunda and R. temporaria. Two frog skin peptides, FLPLLAASFACTVTKKC and FVPLLVSKLVCVVTKKC are mentioned in ref 28; the newest study of Rana arvalis skin peptides was performed by us.29 Seventeen peptides were detected, a major part of them have a six or seven-membered cystine loop at the C-terminus, which is typical for ranid frog skin peptides. The peptide profile of Rana arvalis is presented on a 2D mass map in Figure 2. A complete list of peptides is given in Table S1 in the Supporting Information. The most abundant detected peptide was a bradykinin RPPGFSPFR, surrounded by a large family (black spots). This group appears largely “south-east” of the averagine, consistent with the basic nature of bradykinins (28) Samgina, T. Y.; Artemenko, K. A.; Gorshkov, V. A.; Poljakov, N. B.; Lebedev, A. T. J. Am. Soc. Mass Spectrom. 2008, 19, 479–487. (29) Samgina, T. Y.; Artemenko, K. A.; Gorshkov, V. A.; Ogourtsov, S. V.; Zubarev, R. A.; Lebedev, A. T. Rapid Commun. Mass Spectrom. 2009, 23, 1241–1248.
3742
Analytical Chemistry, Vol. 81, No. 10, May 15, 2009
Figure 2. 2D mass map of the peptide profile of Rana arvalis.
(they typically contain two arginines) as well as the presence of polar residues, such as serine. Even more to the “south-east” is the group of ranatuerins lacking disulfides (white spots; most abundant representative: [N/GG]-I/L-I/L-DVVKGAAKN-I/L-I/LA. These peptides are rich with both basic and aliphatic residues. Disulfide-free brevinins are the “far-east” neighbors, most abundant of which, PIIVSGK, similarly contains both basic lysine and aliphatic (iso)leucines. Both brevinins and ranatuerins are also found with disulfide bonds (double border lines of the spots). These molecules are found far to the “north-west” of the sulfurfree relatives. Finally, there is a “far-west” archipelago on the map whose position reveals its acidic nature. Indeed, these are acidic spacers. The most abundant representative, DEDEDSDHAKAE, occupies the “south-western” corner. To the “north-west” of this corner is another group of acidic spacers. Their position in respect to the first group reveals the presence of sulfur-containing residues. Indeed, the most abundant representative contains doubly oxidized methionines: DEDEDM(O2)M(O2)AGEAKAE. The remarkable feature of Figure 2 is that peptides identified by sequence similarities as belonging to one family are clustering together on the 2D mass map. This, however, is easily explained by the fact that the amino acid substitutions in the peptides are not random. For instance, Ser is often substituted by homologues Thr, Gly by Ala, Asp by Glu and vice versa. These -CH2substitutions produce but a modest shift on the 2D map. The Marsh frog Rana ridibunda Pallas, 1771, as well as the Pool frog Rana lessonae Camerano, 1882 are parental species for the Edible frog Rana esculenta Linneaus, 1758, which is considered by some authors to be their natural hybrid.30,31 Phylogenetic links inside the green frog complex are not completely defined as being a matter of discussion. R. esculenta is a well-studied frog, while the peptide profile of R. ridibunda has just recently been described by us.19,32,33 The large set of disulfide-linked peptides is described. A subset of these peptides is the same as R. esculenta peptides while another subset is unique for R. ridibunda. A natural habitat (30) Ali, M. F.; Knoop, F. C.; Vaudry, H.; Conlon, J. M. Peptides 2003, 24, 955– 961. (31) Wang, Y.; Knoop, F. C.; Remy-Jouet, I.; Delarue, C.; Vaudry, H.; Conlon, J. M. Biochem. Biophys. Res. Commun. 1998, 253, 600–603. (32) Artemenko, K. A.; Samgina, T. Y.; Doyle, J. R.; Llewellyn, L. E.; Bilusich, D.; Bowie, J. H.; Lebedev, A. T. Mass-Spektrom. 2007, 4, 79–88. (33) Samgina, T.Yu.; Artemenko, K. A.; Gorshkov, V. A.; Ogourtsov, S. V.; Zubarev, R. A.; Lebedev, A. T. Rapid Commun. Mass Spectrom. 2008, 22, 3517–3525.
Figure 3. 2D mass map of the peptide profile of Rana ridibunda.
of the Marsh frog is extremely broad and covers several climatic zones. On the territory of the former USSR it is found in European part of Russia (southward of the latitude 60 North), in Ukraine, the Caucasus, and Central Asia.34 The Rana ridibunda peptide map is shown in Figure 3. The same four families of peptides as in the previous example are present here, but the exact composition and the quantities are different. The most abundant peptide is now a ranatuerin with an internal S-S bond. Brevinins, especially those with disulfides, have dramatically dropped in abundance. Interestingly, ranatuerins are now found in the general direction of “north-east” from brevinins, the opposite direction compared to Figure 2, which indicates that ranatuerins from Rana ridibunda are more basic and/or aliphatic than those from Rana arvalis. As we have noted earlier, peptide position on a 2D mass map can be determined experimentally even without knowing the peptide sequence. To illustrate this point, we have measured monoisotopic and average masses of four Rana ridibunda peptides (Figure 4). Monoisotopic masses were measured with the accuracy of a few parts per million (ppm), while average masses are difficult to measure experimentally with ppm accuracy.35 The main difficulty arises from the fact that the high-mass tail of an isotopic distribution has a low abundance and thus is disregarded in experimental measurements below a certain cutoff level. Therefore, while positions on the NMD scale are very close to theoretical ones, the experimental NIS positions are systematically shifted downward. Still, as Figure 4 shows, relative positions of peptide spots on the 2D map reflect their theoretical configuration in Figure 3. For example, from the proximity of the spots D and C, one can immediately deduce the close relationship of the two peptides: indeed, these two acidic spacers differ by a GA combination. R. temporaria is the most well-studied species among the three frogs studied in this work. Not surprisingly, 58 of the total 89 Mascot identifications belong to R. temporaria. It is worth to mention that 46 peptides out of 58 (81%) were also sequenced de novo by our software. The high scientific interest to R. temporaria is most likely because of the broad areal of this species, which is also known as the European Common (or Brown) Frog. Rana (34) Kuzmin, S. L. Amphibians of the Former USSR; Tovarischestvo Nauchnyh Izdanii KMK: Moscow, Russia, 1999. (35) Zubarev, R. A.; Demirev, P. A.; Håkansson, P.; Sundqvist, B. U. R. Anal. Chem. 1995, 67, 3793–3798.
Figure 4. Comparison of the theoretical and experimentally measured position on the 2D mass map of four Rana ridibunda peptides and mass spectra: (A) DNPDENEANEGGA, (B) DNPDENEANEG, (C) K-I/L-I/L-I/L-NPKFRCoxKAAFCox, and (D) RPPGFTPFR. (E) 2D mass map of the four peptides.
temporaria is found throughout much of Europe as far north as the Arctic Circle and as far East as the Urals, except for most of Iberia, southern Italy, and the southern Balkans. At least 15 skin peptides of R. temporaria are presented in SwissProt database and most of them were discovered by the Simmaco group.8 Only a few disulfides are described for this species while the major part of R. temporaria skin secretion consists of acyclic temporins. Rana temporaria shows a very different peptide profile from the two previous frogs (Figure 5). Brevinins are absent, being replaced by temporins. Traditionally, temporins are separated into several types. Here, temporins 1 are the most abundant of all families. On a 2D map, these peptides occupy a position far-“east” of averagine, which reflects their aliphatic-basic nature (typical representative LLPNLLKSL). The presence of acidic spacers is nominal. The overall center of gravity is shifted to “east” and “north” of averagine. Ranatuerins are also present in large Analytical Chemistry, Vol. 81, No. 10, May 15, 2009
3743
Figure 5. 2D mass map of the peptide profile of Rana temporaria.
Figure 6. 2D mass map of the main classes of natural biopolymers.
quantities, while bradykinins have markedly lower abundance than in the other two frogs. Interestingly, the same peptide from R. temporaria skin, GLLSGLKKVGKHVAKNVAVSLMDSLKCKISGDC-OH, was described by the authors8 as brevinin-1T, while another research group36 called it ranatuerin-1T. The same problem of nomenclature and peptide naming was declared recently by Conlon.37 Those facts reflect the subjective character of the conventional peptide classification, while localization of the peptide mass on the 2D plot may be a more effective peptide grouping tool. The 2D mass mapping method can be efficiently used not only for peptide mixtures but also for any molecular mixtures analysis. This point is illustrated by Figure 6 where three different classes of biopolymers are plotted. The plot demonstrates clear separation of peptides from carbohydrates and nucleic acids, as well as separation within the clasees. Because of the absence of overlap, 2D mass mapping can be used as a filter for distinguishing peptides (or any other class). Finally, we tested the ability of the new data presentation method to predict the biological activity of peptides. For this purpose, masses of 216 amphibian peptide sequences11 with known activities were put on the 2D map (Figure 7). For clarity, the antimicrobial peptides were separated into two groups ac(36) Goraya, J.; Knoop, F. C.; Conlon, J. M. Peptides 1999, 20, 159–160. (37) Conlon, G. M. Peptides 2008, 29, 1631–1632.
3744
Analytical Chemistry, Vol. 81, No. 10, May 15, 2009
Figure 7. 2D mass maps of 216 bioactive frog peptides from ref 11. The antimicrobial peptides were separated into two groups according to the origin of the frog species, Australia (upper panel) and other continents (lower panel), while peptides with other types of activity were arbitrary separated between two panels.
cording to the origin of the frog species, Australia (upper panel) and other continents (lower panel), while peptides with other types of activity were arbitrary separated between two panels. In both groups, all antibiotic peptides were located “east” (right) of the line NMD ) 0.53. Different types of neuropeptides were found mostly “west” (left) of that “antibiotic line”, as were some inactive peptides like acidic spacers. Interestingly, peptides classified as nNOS-inhibitors11 occupy the same area as antibacterial peptides, and many of them share large sequence similarities or even identities with antimicrobial peptides. This observation raises a hypothesis that other antibiotic peptides in the vicinity of the same area may exhibit nNOS inhibition activity. Thus a researcher can hypothesize about the potential biological activity of a novel peptide from a known organism based on its position on the 2D map and known activity of other peptides from the same or related organisms. CONCLUSIONS 2D mass mapping affords an easy and comprehensive way of visualizing complex peptide mixtures. For a given peptide, its position relative to averagine carries information of the peptide’s relative basicity or acidity and/or the presence of aliphatic or aromatic residues. Cysteine-containing peptides are easily distinguished from their cysteine-free analogues. In many cases, the peptide family identity becomes apparent from the
peptide’s position on the 2D mass map. Addition of the quantitative dimension makes a very easy comparison of peptide profiles of different animal species. Since conditions of the natural habitat of the frogs entails changes in the skin peptide profile from population to population caused by forced adaptation to a new pathogenic and predator environment, peptide mass visualization may significantly help the intra- and interspecific diversity studies. Finally, the 2D mass mapping method is rather general and can be used for any complex mixture, peptide as well as nonpeptide alike.
SUPPORTING INFORMATION AVAILABLE Additional information as noted in text. This material is available free of charge via the Internet at http://pubs.acs.org.
Received for review December 1, 2008. Accepted March 20, 2009. AC802532J
Analytical Chemistry, Vol. 81, No. 10, May 15, 2009
3745