Determination of Multimodal Isotopic Distributions: The Case of a 15N

Publication Date (Web): May 14, 2015. Copyright © 2015 American Chemical Society. *Phone: +33(0)2-35-52-29-40. Fax: +33(0)2-35-52-24-41. E-mail: carl...
2 downloads 6 Views 2MB Size
Article pubs.acs.org/ac

Determination of Multimodal Isotopic Distributions: The Case of a 15 N Labeled Protein Produced into Hairy Roots Romain Trouillard,† Marie Hubert-Roux,‡ Vincent Tognetti,‡ Laure Guilhaudis,† Carole Plasson,§ Laurence Menu-Bouaouiche,§ Laurent Coquet,∥ François Guerineau,⊥ Julie Hardouin,∥ Jean-Pierre Ele Ekouna,⊥ Pascal Cosette,∥ Patrice Lerouge,§ Michèle Boitel-Conti,⊥ Carlos Afonso,*,‡ and Isabelle Ségalas-Milazzo*,† †

Normandie Université, COBRA, UMR6014 and IRIB; Université de Rouen; INSA de Rouen; CNRS, IRCOF, 1 rue Tesnière, 76821 Mont-Saint-Aignan Cedex, France ‡ Normandie Université, COBRA, UMR6014 and FR3038; Université de Rouen; INSA de Rouen; CNRS, IRCOF, 1 rue Tesnière, 76821 Mont-Saint-Aignan Cedex, France § Normandie Université, Glyco-MEV, EA 4358 and IRIB, Université de Rouen, 1 rue Tesnière, 76821 Mont-Saint-Aignan Cedex, France ∥ CNRS UMR 6270, PBS, Plateforme Protéomique PISSARO, IRIB, FR3038 INC3M, Normandie Université, Université de Rouen, Boulevard Maurice de Broglie, 76821 Mont-Saint-Aignan Cedex, France ⊥ Biologie des plantes et innovation (BioPI), Université de Picardie Jules Verne, 33 rue St Leu, 80039 Amiens, France S Supporting Information *

ABSTRACT: Isotopic labeling is widely used in various fields like proteomics, metabolomics, fluxomics, as well as in NMR structural studies, but it requires an efficient determination of the isotopic enrichment. Mass spectrometry is the method of choice for such analysis. However, when complex expression systems like hairy roots are used for production, multiple populations of labeled proteins may be obtained. If the isotopic incorporation determination is actually well-known for unimodal distributions, the multimodal distributions have scarcely been investigated. Actually, only a few approaches allow the determination of the different labeled population proportions from multimodal distributions. Furthermore, they cannot be used when the number of the populations and their respective isotope ratios are unknown. The present study implements a new strategy to measure the 15N labeled populations inside a multimodal distribution knowing only the peptide sequence and peak intensities from mass spectrometry analyses. Noteworthy, it could be applied to other elements, like carbon and hydrogen, and extended to a larger range of biomolecules.

T

consequence, their impact on protein structure and dynamics cannot be investigated. Among alternative systems available for protein production, the hairy roots may constitute a good candidate for the production of labeled glycoproteins for NMR structural studies. They are obtained via plant genetic transformation and can be cultivated in vitro in a confined and controlled medium, allowing easy supply of labeled molecules.7 As for all plant systems, these transformed roots can perform complex eukaryotic glycosylation. Furthermore, they are able to secrete the protein of interest in high amount (about 120 mg L−1) in the culture media thanks to the fusion of a signal peptide, making protein purification easier to achieve.8

he structural studies of glycoproteins have recently become a new challenge.1 Nevertheless, only a few complex proteins harboring glycan structures are deposited in the Protein Data Bank (3500 among the 100 000 biological macromolecules available). Actually, the heterogeneous glycan chains of glycoproteins often hinder the crystal growth, preventing the use of crystallography, the most widespread method for protein structure determination.1,2 Even if recent studies demonstrate the possibility to use X-ray crystallography with protein microcrystals, this method is little used for glycoproteins.3,4 Nuclear magnetic resonance (NMR) spectroscopy is an appealing alternative technique, but it requires proteins exhibiting 15N or 15N/13C isotopic labeling for overcoming resonance overlap and assignment difficulties.5 Usually, these labeled proteins are produced in prokaryotic expression systems, such as E. coli bacteria.6 However, these latter do not perform eukaryotic glycosylations, and, as a © 2015 American Chemical Society

Received: November 14, 2014 Accepted: May 14, 2015 Published: May 14, 2015 5938

DOI: 10.1021/acs.analchem.5b01558 Anal. Chem. 2015, 87, 5938−5946

Analytical Chemistry



In this context, we have recently initiated a program aiming at exploring the possibility to produce, in hairy roots, labeled glycoproteins for NMR structural studies. One milestone of our project consisted of checking the abilities of roots to produce 15 N or 15N/13C labeled proteins, and our first investigations focused on the 15N labeling. For this purpose, a well-known protein easy to detect and already produced in large amount by roots, the enhanced green fluorescent protein (eGFP), was chosen as a model.8 eGFP is a non glycosylated protein of 27 kDa, composed of 239 amino acids and forming a β-barrel three-dimensional structure.9−12 A key point at this step required assessing the protein labeling ratio via the determination of the isotopic enrichment. Mass spectrometry (MS), which allows the direct detection of isotopic distributions, is a method of choice for such analysis. Sometimes, the isotopic incorporation is analyzed directly from the whole protein, like Ippel et al.;13 however, it is mostly determined at the peptide level allowing more accurate measurements. Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS), in particular, can be used for the evaluation of the peptide labeling. MALDI as ionization source mainly produces intact singly charged ions, which yields simplified mass spectra. Besides, even if, in the reflectron mode, the TOF instrument resolution remains insufficient to separate isobars14 (atomic or molecular species with the same nominal mass but different exact masses, like 12C1313C11H1814N216O5 and 12 C141H1814N115N116O5), it allows in fact the successful separation of isotopic forms, which only differ from one mass unit (like 12C141H1814N216O5 and 12C141H1814N115N116O5).15 For these reasons, the resolving power of the TOF (R ≈ 104) was sufficient, and the MALDI-TOF instrument was chosen for our study. Two cases of isotopic distributions may be encountered. The most common isotopic distributions are obtained with a fixed ratio of each isotope, leading to an isotopic envelope with a unimodal shape. In contrast, in the case of a mixture of different labeled populations, an isotopic envelope with a multimodal shape can be obtained. To date, most studies of labeled protein production by an expression system have mainly shown unimodal distributions.16−19 In this case, peptide isotopic incorporation is determined by matching several in silico calculated distributions and the experimental one.20,21 Conversely, peptide multimodal distributions are more complex to analyze. They are often determined for proteomic quantification where they are artificially obtained by mixing several labeled samples with different known isotope ratios.18,19 With hairy roots, multimodal isotopic distributions were expected as these complex living organs are able to accumulate unlabeled molecules during their growth. Furthermore, the number of labeled populations and their respective isotope ratios were unknown. Although some programs, like MuxQuant or MaxQuant, are able to calculate an isotopic enrichment from peptide multimodal distributions with several known isotope ratios,22−24 none of them allows this determination with unknown isotopic populations. Thus, we present here a new simple method for calculating, from peptide MALDI-TOF analyses, the different unknown populations and their labeling percentage inside experimental multimodal isotopic distributions.

Article

EXPERIMENTAL SECTION

Material. The transformed Brassica rapa hairy root clone 2M1 was used for eGFP production.8 Products for culture media and SDS-PAGE gels, as well as for MALDI-TOF calibration and matrix, were obtained from Sigma-Aldrich (StLouis, MO). Solvents for mass spectrometry analyses were purchased from VWR (Strasbourg, France). 15N labeled nitrates (K15NO3 and 15NH415NO3) and D2O were obtained from Eurisotop (St-Aubin, France). Sterile PVDF membranes (0.22 μm) and Amicon ultra-15 centrifugal filter units (10 kDa) were obtained from Merck-Millipore (Darmstadt, Germany). Sequencing-grade modified trypsin (V5111) was purchased from Promega (Promega France, Charbonnières-les-Bains, France). eGFP Production. Hairy root selection, culture media preparation, and eGFP production were achieved with a protocol similar to the one published by Huet et al.,8 as described in the Supporting Information (Experimental Methods). eGFP Trypsin Digestion. eGFP was quantified by measuring the absorbance at 280 nm using a Nanodrop ND2000C spectrometer (Thermo Scientific, Villebon-sur-Yvette, France). eGFP was concentrated at around 2.5 g L−1, and 50 μg was loaded on a 12% PAGE gel. Five microliters of load sample buffer 5X (62.5 mM Tris HCl pH 6.8, 2% SDS, 2.5% βmercaptoethanol, 10% glycerol, and 0.005% bromophenol blue) was added to the 20 μL sample, and the mix was loaded on a 12% polyacrylamide gel for sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) migration at 120 V for 90 min in a migration buffer (25 mM Tris, 190 mM glycine, 0.1% SDS). Proteins were fixed on gel by a 10 min immersion in destaining buffer (ethanol/acetic acid/water 20:10:70 v/v/v). The gel was stained with Coomassie blue (H2O, ethanol, acetic acid 45:45:10 v/v/v, and 30 mg L−1 Coomassie brilliant blue G250) during a few minutes until protein bands stained and was immediately washed with destaining buffer. Protein bands were excised and stored in destaining buffer. For protein digestion, sequencing-grade modified trypsin (V5111) was used. After successive washings with water, bicarbonate buffer, and acetonitrile, 20 μL of trypsin at 7 ng μL−1 was added, followed after 20 min by 35 μL of 0.01 M bicarbonate buffer (pH 7.8). The digestion was performed overnight at 37 °C. Every 3 h, a volume of water was added to compensate for evaporation. Resulting peptides were eluted in three subsequent steps of 10 min with 200 μL of acetonitrile (ACN), 200 μL of H2O with 1% trifluoroacetic acid (TFA), and 200 μL of ACN. Supernatants were combined, and samples were evaporated with a vacuum concentrator system. Samples were dried under vacuum, and peptides were dissolved in 1% TFA. Mass Spectrometry Analysis. MALDI-TOF and MALDITOF/TOF experiments were performed on an Autoflex III time-of-flight mass spectrometer (Bruker Daltonics, Bremen, Germany) equipped with a frequency-tripled Nd:YAG laser emitting at 355 nm. FlexControl (3.3) and FlexAnalysis (3.3) software packages (Bruker Daltonics, Bremen, Germany) were used for data acquisition and processing. Spectra were acquired in the positive-ion reflectron mode at a 50 Hz laser repetition rate. The acceleration voltage was set to 19 kV, and a 220 ns extraction delay time was used. One microliter of sample was mixed with 1 μL of α-cyano-4-hydroxycinnamic acid matrix (10 mg mL−1 in methanol acetonitrile 1:1 mixture). External 5939

DOI: 10.1021/acs.analchem.5b01558 Anal. Chem. 2015, 87, 5938−5946

Article

Analytical Chemistry

were scattered along the whole m/z range and contained 18 amino acids out of the 20 natural ones. By comparison with the control condition, conditions 1 and 2 gave a large multimodal isotopic distribution. This reflected heterogeneous isotopic labeling, constituted by differentially labeled protein populations. A relative quantification could be deduced from these results. For this purpose, the three experimental conditions were compared to two theoretical distributions: the isotopic distribution of the unlabeled peptide and the distribution of the 100% 15N labeled peptide (Figure 1). As expected, the peptide isotopic distributions of the control

calibration of MALDI-TOF mass spectrometer was carried out using a mixture of 1 μL of leu-enkephalin at 18 pmol μL−1, 1 μL of bradykinin at 5 pmol μL−1, 1 μL of angiotensin II at 5 pmol μL−1, 1 μL of P14R at 5 pmol μL−1, 1 μL of ACTH fragment 18-39 at 10 pmol μL−1, and 5 μL of matrix. One microliter of each sample was spotted on the target plate. The mass spectra were the result of 1500 laser shots. The laser fluence was set slightly higher than matrix desorption threshold (∼55% attenuation of maximum laser fluence). For product ion analysis in the tandem time-of-flight (TOF/ TOF) mode, the recorded spectra were the result of the average of 3000 spectra in the parent mode (30% of maximum laser fluence) and 6000 spectra in the fragment mode (50% of maximum laser fluence). The precursor ion kinetic energy was 8 keV. The selection of the isotopic distribution of the m/z 804 ion was carried out with a time ion gate with a window of ±15 m/z units for the m/z 809.5 value corresponding to the center of the distribution. Product ions generated by laser-induced dissociation were further accelerated to 19 keV by the Bruker LIFT cell that allows a full product ion spectrum to be recorded. The in silico lists of the peptides and product ions were obtained by protein prospector (University of California).



RESULTS AND DISCUSSION Protein Production. The eGFP production protocol began with growing the transgenic roots for 10 days in a sucrose Gamborg medium (B5).25 Next, hairy roots were soaked in a nitrogen-free medium for depriving them of nitrogen source and allowing the consumption of unlabeled molecules before using 15N molecules. Roots were then soaked in the tested culture medium for the final production step. Different culture media were tested to induce different isotopic incorporations. Nitrogen balance has already been fixed for hairy root growth during the enhancement of protein production (2.5 g L−1 KNO3 and 0.160 g L−1 NH4NO3), and a modification of these concentrations could affect the protein synthesis.7 Thus, only the carbon source and its concentration were changed to slightly impact the 15N incorporation metabolism of hairy roots without affecting the protein production level. Three production culture media were tested. The first one, which will be referenced as control condition, contained 30 g L−1 of sucrose and unlabeled nitrates. The second one, condition 1, contained glucose at 5 g L−1 and labeled nitrates. The last one, condition 2, contained glucose at 30 g L−1 and labeled nitrates. Relative Quantification of 15N Labeling. The 15N isotopic incorporation of eGFP was investigated at the peptide level after proteolytic digestion.26 Thereby, the hairy root culture media were concentrated and analyzed by SDS-PAGE. Gel bands corresponding to eGFP were excised, and eGFP peptides were generated by trypsin digestion. Protein characterization was carried out with the unlabeled control condition by identification of the eGFP peptides (Supporting Information Figure S-1). The experimental m/z was compared to the m/z list of in silico trypsin digestion peptides. More than 15 eGFP peptides were identified, corresponding to amino acid sequence coverage of about 50%. We chose to monitor four [M + H]+ ions yielding intense signals: m/z 1266 (SAMPEGYVQER), m/ z 1282 (m/z 1266 ion oxidized form), m/z 804 (QHDFFK with formation of pyroglutamic acid), and m/z 1961 (AEVKFEGDTLVNRIELK), such that these peptide ions

Figure 1. Superimposition of the two theoretical distributions, unlabeled and 100% 15N labeled, with the three experimental conditions (see text) for (A) m/z 804 ion, (B) m/z 1266 and 1282 ions, and (C) m/z 1961 ion.

condition were in line with the theoretical unlabeled profile. In addition, condition 2 induced a more efficient labeling than condition 1. However, even if this simple relative quantification was informative, it was necessary to determine an absolute value of the isotopic incorporation level. Calculation of the Isotopic Population Distributions. In proteomics, methods have already been described for isotope enrichment measurement of a multimodal distribution.23,24 Notably, Palmblad et al. developed the MuxQuant software, able to determine the isotopic enrichment from a multimodal distribution artificially obtained by mixing several labeled samples with different known 14N/15N ratios.22 In these approaches, the actual 14N/15N ratio before mixture should be first determined. The proportions of each labeled population in the mixture then were calculated, one population corresponding to one sample. Here, in contrast, a labeled protein mixture was directly obtained, where the number of labeled populations and their respective 14N/15N ratios were unknown. Thus, we developed a new method for calculating, in one step, absolute values of different unknown populations coming from a multimodal distribution. The method was implemented in an Excel spreadsheet. The main steps of the method are presented in Figure 2. The peptide isotopic distribution was considered as the result of the convolution of all of the isotopic distributions of each element of the peptide. The elementary composition of the peptide (CCtotHHtotOOtotSStotNNtot) was determined from its single-letter code amino acid sequence. Hydrogen, carbon, oxygen, and sulfur isotopic distributions then were calculated separately on the basis of the number of each element and their natural isotopic abundance. Next, a convolution of these distributions was calculated to obtain an artificial natural distribution without contribution of nitrogen. The following step consisted of the 5940

DOI: 10.1021/acs.analchem.5b01558 Anal. Chem. 2015, 87, 5938−5946

Article

Analytical Chemistry

Figure 2. General process for calculating the 15N labeling populations inside a multimodal isotopic distribution. In this figure, the H, N, S, and O distributions are arbitrarily truncated to M+4. The number of N distributions was determined following eq 6.

performed by Excel spreadsheet trough the “Data Analysis” add-in. However, others spreadsheets do not offer this option. Thus, we decided to use a method, where only additions and multiplications are required, working with any spreadsheet. For this, two small databases were created. The first database was constituted by all possible isotopic distributions for a number of oxygen atoms varying between 0 and 100. The second database was constituted by all possible isotopic distributions of sulfurs for 0−10 sulfur atoms. For the chosen X element, the database was obtained by a convolution series of its isotopic distribution. For each row of the database, a variable a, corresponding to the number of the X element in the molecule, was introduced. alim was fixed to 100 for O and 10 for S. The isotopic distribution, DX, at row a, was obtained following the Punnett square like method (eq 2), by convoluting DX corresponding to 1 atom (row 1) by DX at the row a − 1:

nitrogen distribution calculation. Yet, for this latter element, the situation was more complex because its isotopic distribution results from a sum of populations with different 14N/15N ratios. Thus, the nitrogen distribution was calculated with several 14 N/15N ratios, and then each of them was convoluted by the artificial natural abundance without nitrogen to obtain the different labeled populations. In the last step, the proportion of each theoretical population was calculated by fitting to the experimental distribution. In more details, for calculating distributions of elements that possess two stable isotopes (like C and H), a binomial distribution was used.15,23,27,28 For instance, each peak of the carbon distribution, DC, was calculated according to (see also Supporting Information Figure S-2): ⎛ C tot ⎞ 13 C tot ⎜ ⎟ nat Ctot − Cn (1 − P 12nat)13Cn IM = 13 ⎜13 ⎟(P 12C ) C + Cn ⎝ Cn ⎠

IMXa+ h =

(1)

∑ f +g=h

where Ctot is the total number of carbon atoms in the peptide, 13 12 Cn is the number of 13C atoms, Pnat C 12C is the natural Ctot 13 abundance, and IM+13Cn is the intensity of the M+ Cn peak of the carbon isotopic distribution. For O and S, which possess more than two stable isotopes, a multinomial distribution had to be used for the distribution calculation. The fast Fourier transform (FFT) is an efficient and accurate method to simulate such distribution29 and can be

IMXa+−f1IMX1+ g (2)

X

a where IM+h denotes the intensity of the M+h peak in DX for a atoms. h, f, and g are the peak positions in DX for a, a − 1, and a = 1 atoms, respectively. Thus, h, f, and g depend on a and on the number of isotopes of X, such as h ∈ [0; aΔX], f ∈ [0; (a − 1)ΔX], and g ∈ [0; ΔX]. ΔX was calculated from eq 3 where Aheavier(AX) and Alighter(AX) represent the mass numbers of the heavier and the lighter isotopes of the X element, respectively:

5941

DOI: 10.1021/acs.analchem.5b01558 Anal. Chem. 2015, 87, 5938−5946

Article

Analytical Chemistry ΔX = Aheavier (AX) − Alighter (AX)

(CHOS)tot IM+σ is the intensity of the peak M+σ of the DCHOS distribution. For nitrogen, we have expanded the multimodal distribution as a sum of unimodal distributions, each characterized by its labeling ratio Pi according to eq 5:

(3)

a 1 For the particular a = 1 case, IXM+h = IXM+g is given by the natural a 0 abundance of the isotopes. Note that for a = 0, IXM+0 = IXM+0 =1 X0 and the others IM+h = 0. Figure 3 illustrates such isotopic distribution calculation for row 4 of the oxygen database (molecule containing 4 oxygen atoms).

⎛ Ntot ⎞ 15 15 Ntot I(Pi)M = ⎜⎜15 ⎟⎟PiNtot − N n(1 − Pi) N n +15 N n ⎝ Nn ⎠

(5)

For this step, two ways were proposed in the method. First, one assumed the multimodal distribution as a sum of Ntot + 1 unimodal distributions, where the values of the labeling ratios, Pi, of each labeling population were calculated following eq 6: ⎧ Pi = 0 = P 14nat N ⎪ ⎨ i ⎪ Pi ≥ 1 = N ⎩ tot

This model reflects more concretely the isotopic distribution because each 14N/15N ratio corresponds to an integer number of nitrogen. The second one considered the multimodal distribution as a sum of 11 unimodal distributions where Pi were calculated following eq 7:

Figure 3. Principle of convolution with the Punnett square like method. Example of a convolution calculation for the oxygen isotopic distribution at row a = 4. For oxygen, ΔX = 18 − 16 = 2; thus, h, f, and g are integer variables such as h ∈ [0; 8], f ∈ [0; 6], and g ∈ [0; 2].

P0 ≤ i ≤ 10 =

∑ 13

Cn + 2 H n + p + q = σ

tot I(Pi)(CHOSN) = M+k

(7)

Ntot (CHOS)tot (I M × I(Pi)M ) +σ +15 N

∑ 15

n

σ+ Nn = k

(8)

In a final step, the N labeling populations inside the experimental isotopic populations were determined. To this aim, the intensities Itheo M+k of a theoretical distribution were simulated by giving a weight wi to each population following: 15

(CHOSN)tot theo IM = + k = IM + k

∑ [I(Pi)(CHOSN) M+k

tot

i

× wi]

(9)

The wi coefficients were determined by minimization of the V value (see eq 10 for definition) calculated by the sum of the square differences between each calculated Itheo M+k and experimental Iexp peak intensities (the sum of experimental peaks M+k being normalized to 1) under the following constraints: ⎧V = ∑ (IMexp+ k − IMtheo+ k)2 ⎪ k ⎪ ⎪ ⎨∑ w = 1 i ⎪ ⎪ i ⎪∀ i 0 ≤ w ≤ 1 ⎩ i

(10)

This least-squares minimization was performed numerically with the Excel Solver function using an iterative method. In eq 10, Iexp was evaluated with peak heights. In theory, k ∈ [0; σ + 15Nn], but it was impossible to calculate an isotopic distribution with too large numbers of peaks. Thus, for practical reasons, two thresholds were

C tot Otot Stot (IM × IMHtot × IM + p × IM + q) + 13C + 2H n

i 10

In this model, the number of distributions is fixed avoiding too large number of variables and long calculation. The next step of the calculations consisted of simulating the distribution of all possible isotopic labeling populations, DCHOSN(Pi). For a labeling population Pi, DCHOSN(Pi) was obtained by convoluting DN(Pi) with DCHOS following eq 8:

To obtain the O distribution, DO, of a peptide, the variable a tot was set to Otot and the intensities IOM+p were automatically tot extracted from the O database. The intensities ISM+q , of the sulfur distribution, DS, were obtained in a similar way with a = Stot. Last, to distinguish the peak position in the distribution of the two elements, the variable integer h was named p for oxygen and q for sulfur such as p ∈ [0; OtotΔO] and q ∈ [0; StotΔS]. For the convolution with the Punnett square like method, we considered that, except for the first and the last isotopic distribution peaks, all peaks are a sum of isobars under the used experimental conditions. Indeed, with a resolving power R > 105, the isobars would be separated. For example, for 2 oxygen atoms, two peaks would be obtained for M+2, the first one corresponding to 16O18O + 18O16O and the second one corresponding to 17O17O. However, the TOF analyzer, used in this work, operating in the reflectron mode, presented a resolving power of around 10 000 (FWHM). Thus, in our case, for 2 oxygen atoms, only one peak was obtained for M+2 corresponding to 16O18O + 18O16O + 17O17O. Once the distributions of C, H, O, and S atoms were simulated, the artificial natural abundance isotopic distribution without nitrogen contribution, DCHOS, was calculated. A Punnett square like method was applied, and DCHOS was obtained by convoluting oxygen, carbon, hydrogen, and sulfur distributions by themselves according to eq 4: (CHOS)tot IM = +σ

(6)

n

(4)

where σ represents the peak position in DCHOS and is given by σ = (13Cn + 2Hn + p + q), (CHOS)tot = CtotHtotOtotStot, and 5942

DOI: 10.1021/acs.analchem.5b01558 Anal. Chem. 2015, 87, 5938−5946

Article

Analytical Chemistry introduced for the developed method. The first limit concerned the minimum intensity of the peak taken into account for the calculation. It was set to 0.01%, a value that is largely sufficient for accuracy around 5%. The second limit concerned the peptide elementary composition. A limit is necessary for the Punnett calculations and the database creation. The chosen elementary composition limit was set to C200H300N50O100S10. It corresponds to a 30 amino acid peptide, a size that is rare after trypsin digestion without missed cleavage. The minimum peak number of each isotopic envelope, required for avoiding too large calculation errors due to convolution of truncated distributions, is shown in Table 1. For DCHOS, the M+14

The second test concerned the required quality of the data. The control condition was analyzed after two different dilutions to obtain three peptide concentrations (Supporting Information Table S-5). Results indicated that the peak intensities and the presence of hidden contaminants under the isotopic distribution affect the observed isotopic envelope. As the method is based on this isotopic envelope, the data quality impacts the reliability of the labeled population calculation. It is thus important to use data in which the signal-to-noise ratio is at least 10. The last test concerned the evaluation of the error between the experimental and the theoretical isotopic distributions. This error E was evaluated (see also Supporting Information Figure S-6) by

Table 1. Intensity of the Last Peaks Taken into Account for Calculation of DH, DC, DO, DS, and DCHOS distribution

atom number

peak position

DH DC DO DS DCHOS

H300 C200 O100 S10 C200H300O100S10

M+3 M+10 M+8 M+9 M+14

E=

intensity in % 1.44 7.12 5.0 5.5 1.2

× × × × ×

10−3 10−3 10−3 10−3 10−2

V exp 2 ∑k (IM + k)

(11)

The E value reflects the difference between the calculated distribution and the experimental one, but it may also give a piece of information on the data quality. With the m/z 1961 ion as an example, the E value was between 3% and 5% for conditions 1 and 2, which is acceptable (Figure 4). However, a

peak intensity value was 1.2 × 10−2%, which is very close to the fixed intensity limit of 0.01%. For N, as the isotopic distribution is variable, all possible isotopic forms without truncation were taken into account for the calculation; that is, all M+15Nn peaks were used. The calculated isotopic distribution of C200H300N50O100S10 obtained with this method was compared to the calculated distributions obtained with IsoPro 3.1, which enables one to calculate accurate unimodal distributions from the molecule elementary composition.30 Results showed negligible differences for the accuracy required for our project (Supporting Information Table S-3). Calculation Method Tests. Before use, several parameters of the elaborated method were evaluated by monitoring the different ions (m/z 804, m/z 1266, m/z 1282, and m/z 1961) in the three experimental conditions. The first test concerned the comparison between the two proposed ways for the calculation of the several theoretical nitrogen distributions (see eqs 6 and 7). Results indicated that the values were different between the two methods, but, except for a few cases, they remained in the same order of magnitude (Supporting Information Figure S-4). These differences were due to the fact that the number of populations and their respective 14N/15N ratios are dependent on the chosen model. The first model, in which nitrogen distribution is considered as a sum of Ntot + 1 labeled populations, reflects more concretely the isotopic distribution for a peptide. Actually, in this model, each 14N/15N ratio corresponds to an integer number of nitrogen. This model is ideal for the comparison of the isotopic incorporation from peptides with the same nitrogen atom number. Nevertheless, for peptides with a different number of nitrogen atoms, the number of populations and their 14N/15N ratios vary. Thus, the comparison of the isotopic incorporation between these peptides may become difficult. With the second method, regardless of the peptide nitrogen number of the peptide, the nitrogen distribution is always considered as a sum of 11 labeled populations where each 14 N/ 15 N ratio corresponds to a multiple of 10%. Thus, the second model allows comparing isotopic distribution of peptides with a different number of nitrogen atoms.

Figure 4. Comparison of calculated (violet stick) and experimental (green stick) isotopic distributions from the m/z 1961 ion for (A) control condition, (B) condition 1, and (C) condition 2 using eq 6 as model. For each distribution, E was calculated according to eq 11.

higher E value was obtained for the control condition (near 8%), and this was probably due to an underestimation of the M +2, M+3, and M+4 peak intensities because of their flattened shape. Peptide Isotopic Measurement. The method was then used to calculate the isotopic incorporation of all monitored ions (m/z 804, m/z 1266, m/z 1282, and m/z 1961) in the three experimental conditions. To compare the different peptide isotopic incorporations, the second model, where each 14N/15N ratio corresponds to a multiple of 10%, was used for the calculation. As expected, no labeling was found on ions in the control condition. For conditions 1 and 2, the method for the isotopic 15N incorporation allowed one to calculate several labeled populations for each monitored ion. To simplify and to allow results comparison, the peptide labeling was classified into three groups: a first one with a poor labeling fewer than 30%, a second one with 30−70% of labeling, and a third with high-labeling above 70% (Figure 5). Results indicated a good isotopic enrichment ratio for condition 2: more than 20% of the peptides had a labeling higher than 70% and more than 35% were labeled between 30% and 70% for each monitored ions. For condition 1, a lower labeling yield was obtained because more than 60% of peptides 5943

DOI: 10.1021/acs.analchem.5b01558 Anal. Chem. 2015, 87, 5938−5946

Article

Analytical Chemistry

Furthermore, additional information was obtained from the analysis of the isotopic incorporation of the different peptides from a same condition. For condition 1, peptides seemed to have similar isotopic incorporation (Figure 6A). In contrast, for condition 2, the peptide corresponding to the m/z 804 ion, as compared to the others, featured higher and lower proportions of weakly and highly labeled populations, respectively (Figure 6B). This observation suggested a heterogeneous labeling of the different amino acids. We thus decided to focus on the labeling contribution of amino acids regardless of the peptide isotopic enrichment. Subsequent mass spectrometry analyses were then conducted to investigate the isotopic incorporation at the amino acid level. Amino Acid Isotopic Measurement. Tandem mass spectrometry (MS/MS) by laser-induced dissociation (LID) was carried out, for control condition, on the m/z 804, m/z 1266, m/z 1282, and m/z 1961 ions. The generated y″ and b product ions were identified by comparison of the experimental m/z list of the monoisotopic peaks with an in silico list. Next, tandem mass spectrometry was carried out for condition 2. Product ions from m/z 1266, m/z 1282, and m/z 1961 precursor ions were not usable because of the large isotopic envelope resulting in weak ion intensities, which led to isotopic distribution signals flooded in the background noise. However, from the m/z 804 ion, most of the y″ and b fragment ions have been obtained for condition 2. Thus, the calculation method was tested with these fragment ions following eq 7 as a model (Table 2). Results indicated that each fragment ion presented different isotopic incorporation, which seemed to indicate that the isotopic incorporation is different according to the type of amino acid. In addition, y1″ ion obtained from m/z 804 precursor ion allowed the analysis of the isotopic labeling distribution of lysines, which appeared to be differently labeled. These variations indicated the presence of a heterogeneous labeling within an amino acid presenting several nitrogen atoms. We made the assumption that these heterogeneities resulted from the production of amino acid pools in different hairy root cell compartments.31−34 However, the information obtained from y″ fragment ions did not perfectly match with the information obtained from the b fragment ions. Indeed, in the two cases, when we calculated the average labeling of the three amino acid of the sequence, F1, F2, and D (gHDF1F2K), 47%, 46%, and 49% were, respectively, obtained from the y″ ions, whereas 54%, 69%, and 51% were obtained from the b ions. The observed differences may be explained by the fact that there is a sum of errors when the average labeling of each amino acid was calculated successively, with a maximal error on F2 for the b ions and on D for the y″

Figure 5. Isotopic labeling population, after classification into three groups, of control condition (blue), condition 1 (orange), and condition 2 (brown) for (A) m/z 804 ion, (B) m/z 1266 ion, (C) m/z 1282 ion, and (D) m/z 1961 ion.

Figure 6. Isotopic labeling population, after classification into three groups, of m/z 804 ion (green), m/z 1266 ion (blue), m/z 1282 ion (orange), and m/z 1961 ion (brown) for (A) condition 1 and (B) condition 2.

were weakly labeled (under 30%) and the remaining were mainly between 30% and 70% of labeling. These results indicated that a higher isotopic incorporation ratio was obtained for the peptides in condition 2 (30 g L−1 glucose) than for the peptides in condition 1 (5 g L−1 glucose). This allowed us to deduce that a diminution of glucose concentration decreases the 15N isotopic incorporation into the protein. This is probably due to a growth delay of the hairy roots in condition 1. These first results also indicated that the hairy root system is able to produce a labeled protein with a relatively high labeling without any optimization. An average labeling ratio value was also calculated for each peptide by multiplying the Pi values of each labeled population by their proportion wi. For condition 1, 29%, 27%, 27%, and 25% were obtained from the m/z 804, m/z 1266, m/z 1282, and m/z 1961 ions, respectively, and 51%, 53%, 52%, and 51% for condition 2 (Supporting Information Table S-7). These peptide average labeling ratios allowed one to estimate an average isotopic incorporation of the whole protein of about 27% for condition 1 and 50% for condition 2. Such result was confirmed by a preliminary NMR analysis of eGFP (Supporting Information Figure S-8).

Table 2. Proportion of Each Labeled Population Calculated Following Equation 7 fragment

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

E value

average labeling

y1″ y2″ y3″ y4″ b2 b3 b4 b5

37 9 8 10 0 5 8 5

8 14 13 0 0 0 0 0

8 15 13 0 49 12 0 0

7 15 12 25 10 27 25 25

7 13 10 28 0 17 24 16

6 10 10 15 0 0 9 5

6 7 10 2 0 0 0 8

6 4 11 4 0 10 7 17

5 3 10 16 40 29 22 16

5 4 4 0 1 0 6 1

5 7 0 0 0 0 0 6

4.90 0.70 1.60 5.60 10.60 4.90 7.50 11.90

31 38 41 43 46 47 49 53

5944

DOI: 10.1021/acs.analchem.5b01558 Anal. Chem. 2015, 87, 5938−5946

Article

Analytical Chemistry ions. In addition, the E values for the b ions were very high, indicating large differences between the calculated and the experimental distributions. This is probably due to isotopic distribution overlapping and/or too low signal intensities. Furthermore, as the fragmentation efficiency depends on the peptide sequence, some cleavages might not be favorable, and thus evaluation of the isotopic labeling contribution of several amino acids might be missing. However, our method enabled one to obtain pieces of information on amino acid labeling. Using another mass spectrometer coupled to liquid chromatography should greatly enhance the quality of the fragment isotopic envelope, and should allow using our calculation method.

England, program 4230-PeReNE, the Centre National de la Recherche Scientifique (CNRS - interdisciplinar program “Soutien à la prise de risque”), and the Labex SynOrg (ANR11-LABX-0029) for financial support. V.T. thanks the CNRS for a “chaire d’excellence” at the University of Rouen.





CONCLUSION The validation of hairy roots as an efficient alternative production system of proteins for NMR structural studies requires monitoring the isotopic enrichment of labeled proteins exhibiting multimodal isotopic distributions. In this work, we have developed a strategy to calculate easily and quickly (a few minutes) peptide isotopic distributions from different heterogeneously labeled populations. The method was successfully applied for monitoring the labeling of a protein produced through two distinct experimental conditions. In addition, this method was used to obtain pieces of information about the variation of isotopic incorporation at the amino acid level. Such isotopic incorporation measurements allowed us to demonstrate that B. rapa hairy roots are able to efficiently produce a labeled protein with a high ratio of isotopic incorporation. Furthermore, the determination of the different labeled populations, both at peptide and at amino acid levels, gave us precious information for enhancing the protein labeling into hairy roots. Moreover, even if we have focused here on nitrogen labeling, this strategy could also be applied to other elements, like carbon and hydrogen, which are used for NMR or metabolomic studies. Finally, this strategy is actually not limited to proteins. Elements other than the five present in amino acids could be added for targeting a larger range of small biomolecules. This strategy could therefore be extended to the fields of metabolomics and fluxomics.



ASSOCIATED CONTENT

S Supporting Information *

Additional information as noted in text. The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.analchem.5b01558.



REFERENCES

(1) Nettleship, J. In Glycosylation; Petrescu, S., Ed.; InTech, 2012. (2) Davis, S. J.; Crispin, M. In Functional and Structural Proteomics of Glycoproteins; Owens, R., Nettleship, J., Eds.; Springer Netherlands: Dordrecht, 2010; pp 127−158. (3) Hausmann, J.; Christodoulou, E.; Kasiem, M.; De Marco, V.; van Meeteren, L. A.; Moolenaar, W. H.; Axford, D.; Owen, R. L.; Evans, G.; Perrakis, A. Acta Crystallogr., Sect. F 2010, 66, 1130−1135. (4) Merkur’eva, R. V. Lab. Delo 1966, 12, 712−716. (5) NMR of Macromolecules: A Practical Approach; Roberts, G. C. K., Ed.; The Practical Approach Series; IRL Press at Oxford University Press: Oxford; New York, 1993. (6) Mondal, S.; Shet, D.; Prasanna, C.; Atreya, H. S. Adv. Biosci. Biotechnol. 2013, 4, 751−767. (7) Mairet, F.; Sierra, J.; Glorian, V.; Villon, P.; Shakourzadeh, K.; Boitel-Conti, M. Bioprocess Biosyst. Eng. 2009, 32, 257−265. (8) Huet, Y.; Ekouna, J.-P. E.; Caron, A.; Mezreb, K.; Boitel-Conti, M.; Guerineau, F. Biotechnol. Lett. 2013, 36, 181−190. (9) Tsien, R. Y. Annu. Rev. Biochem. 1998, 67, 509−544. (10) Kent, K. P.; Childs, W.; Boxer, S. G. J. Am. Chem. Soc. 2008, 130, 9664−9665. (11) Khan, F.; Stott, K.; Jackson, S. J. Biomol. NMR 2003, 26, 281− 282. (12) Ormö, M.; Cubitt, A. B.; Kallio, K.; Gross, L. A.; Tsien, R. Y.; Remington, S. J. Science 1996, 273, 1392−1395. (13) Ippel, J. H.; Pouvreau, L.; Kroef, T.; Gruppen, H.; Versteeg, G.; van den Putten, P.; Struik, P. C.; van Mierlo, C. P. M. Proteomics 2004, 4, 226−234. (14) Murray, K. K.; Boyd, R. K.; Eberlin, M. N.; Langley, G. J.; Li, L.; Naito, Y. Pure Appl. Chem. 2013, 85, 1515−1609. (15) Millard, P.; Massou, S.; Portais, J.-C.; Letisse, F. Anal. Chem. 2014, 86, 10288−10295. (16) Ullmann-Zeunert, L.; Muck, A.; Wielsch, N.; Hufsky, F.; Stanton, M. A.; Bartram, S.; Böcker, S.; Baldwin, I. T.; Groten, K.; Svatoš, A. J. Proteome Res. 2012, 11, 4947−4960. (17) Jehmlich, N.; Schmidt, F.; Hartwich, M.; von Bergen, M.; Richnow, H.-H.; Vogt, C. Rapid Commun. Mass Spectrom. 2008, 22, 2889−2897. (18) Snijders, A. P. L.; de Vos, M. G. J.; Wright, P. C. J. Proteome Res. 2005, 4, 578−585. (19) Pan, C.; Fischer, C. R.; Hyatt, D.; Bowen, B. P.; Hettich, R. L.; Banfield, J. F. Mol. Cell. Proteomics 2011, 10, M110.006049. (20) MacCoss, M. J.; Wu, C. C.; Matthews, D. E.; Yates, J. R., III. Anal. Chem. 2005, 77, 7646−7653. (21) Taubert, M.; Jehmlich, N.; Vogt, C.; Richnow, H. H.; Schmidt, F.; von Bergen, M.; Seifert, J. Proteomics 2011, 11, 2265−2274. (22) Palmblad, M.; Mills, D. J.; Bindschedler, L. V. J. Proteome Res. 2008, 7, 780−785. (23) Rockwood, A. L.; Palmblad, M. In Mass Spectrometry Data Analysis in Proteomics; Matthiesen, R., Ed.; Humana Press: Totowa, NJ, 2013; Vol. 1007, pp 65−99. (24) Tyanova, S.; Mann, M.; Cox, J. Methods Mol. Biol. 2014, 1188, 351−364. (25) Gamborg, O. L.; Miller, R. A.; Ojima, K. Exp. Cell Res. 1968, 50, 151−158. (26) Gundry, R. L.; White, M. Y.; Murray, C. I.; Kane, L. A.; Fu, Q.; Stanley, B. A.; Van Eyk, J. E. In Current Protocols in Molecular Biology; Ausubel, F. M., Brent, R., Kingston, R. E., Moore, D. D., Seidman, J. G., Smith, J. A., Struhl, K., Eds.; John Wiley & Sons, Inc.: Hoboken, NJ, 2009. (27) Valkenborg, D.; Mertens, I.; Lemière, F.; Witters, E.; Burzykowski, T. Mass Spectrom. Rev. 2012, 31, 96−109.

AUTHOR INFORMATION

Corresponding Authors

*Phone: +33(0)2-35-52-29-40. Fax: +33(0)2-35-52-24-41. Email: [email protected]. *Phone: +33(0)2-35-52-29-48. Fax: +33(0)2-35-52-24-41. Email: [email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS We gratefully acknowledge the Région Haute-Normandie (IRIB network), the European Regional Development Fund (ERDF 31708 and 32975), INTERREG IV France (Channel) 5945

DOI: 10.1021/acs.analchem.5b01558 Anal. Chem. 2015, 87, 5938−5946

Article

Analytical Chemistry (28) Bocker, S.; Letzel, M. C.; Liptak, Z.; Pervukhin, A. Bioinformatics 2008, 25, 218−224. (29) Rockwood, A. L.; Haimi, P. J. Am. Soc. Mass Spectrom. 2006, 17, 415−419. (30) Yergey, J. A. Int. J. Mass Spectrom. Ion Phys. 1983, 52, 337−349. (31) Verslues, P. E.; Sharma, S. Arab. Book 2010, 8, e0140. (32) Hell, R.; Wirtz, M. Arab. Book 2011, 9, e0154. (33) Allen, D. K.; Laclair, R. W.; Ohlrogge, J. B.; Shachar-Hill, Y. Plant, Cell Environ. 2012, 35, 1232−1244. (34) Allen, D. K.; Goldford, J.; Gierse, J. K.; Mandy, D.; Diepenbrock, C.; Libourel, I. G. L. Anal. Chem. 2014, 1894−1901.

5946

DOI: 10.1021/acs.analchem.5b01558 Anal. Chem. 2015, 87, 5938−5946