Targeted Quantitative Screening of Chromosome ... - ACS Publications

Jul 26, 2016 - ABSTRACT: This work was aimed at estimating the concentrations of proteins encoded by human chromosome 18 (Chr 18) in plasma samples ...
0 downloads 0 Views 3MB Size
Article pubs.acs.org/jpr

Targeted Quantitative Screening of Chromosome 18 Encoded Proteome in Plasma Samples of Astronaut Candidates Artur T. Kopylov,† Ekaterina V. Ilgisonis,† Alexander A. Moysa,† Olga V. Tikhonova,† Maria G. Zavialova,† Svetlana E. Novikova,† Andrey V. Lisitsa,† Elena A. Ponomarenko,† Sergei A. Moshkovskii,† Andrey A. Markin,‡ Anatoly I. Grigoriev,‡ Victor G. Zgoda,*,† and Alexander I. Archakov† †

Institute of Biomedical Chemistry, 119121, Moscow, Russia Institute of Medico-Biological Problems, Russian Academy of Sciences, 123007, Moscow, Russia



S Supporting Information *

ABSTRACT: This work was aimed at estimating the concentrations of proteins encoded by human chromosome 18 (Chr 18) in plasma samples of 54 healthy male volunteers (aged 20−47). These young persons have been certified by the medical evaluation board as healthy subjects ready for space flight training. Over 260 stable isotope-labeled peptide standards (SIS) were synthesized to perform the measurements of proteins encoded by Chr 18. Selected reaction monitoring (SRM) with SIS allowed an estimate of the levels of 84 of 276 proteins encoded by Chr 18. These proteins were quantified in whole and depleted plasma samples. Concentration of the proteins detected varied from 10−6 M (transthyretin, P02766) to 10−11 M (P4-ATPase, O43861). A minor part of the proteins (mostly representing intracellular proteins) was characterized by extremely high inter individual variations. The results provide a background for studies of a potential biomarker in plasma among proteins encoded by Chr 18. The SRM raw data are available in ProteomeXchange repository (PXD004374). KEYWORDS: plasma proteome, targeted proteomics, selected reaction monitoring (SRM), C-HPP, human chromosome 18 (Chr 18)



INTRODUCTION Plasma is a valuable biological material which can be easily obtained from human subjects. Many proteins constituting the plasma proteome are either secreted proteins released into plasma or tissue leakage proteins which appear in circulation as a result of cell death or damage. Analysis of the plasma proteome in the frame of the Human Plasma Proteome Project was one of the early activities under the guidance of the Human Proteome Organization.1 During the project’s pilot phase 889 proteins were reliably identified in plasma or serum.2 Since the early efforts in plasma proteomics, researchers encountered significant technical challenges in this field, which prevented its fast employment to medicine. One of them is an extremely wide concentration range of plasma proteins. The most abundant proteins (∼150) are characterized by a concentration range from 10−3 to 10−7 M3. Detection of proteins with lower than 10−12 M concentrations requires their enrichment by affinity reagents or depletion of major plasma proteins (or extensive plasma fractionation).4 To date, various technical decisions have been proposed for protein and peptide enrichment.4,5 Targeted proteomics was launched a decade ago with the use of selected reaction monitoring (SRM) in a triple quadrupole detector.6 SRM is currently considered as an alternative to conventional immunoassays.7 Since SRM allows multiplexing, © XXXX American Chemical Society

this method may be used for simultaneous quantification of dozens or hundreds of proteins in many samples.8 Large-scale quantification of plasma proteins represents an important task needed for new biomarkers development. In this context, reference concentrations are known only for selected plasma proteins, and most of them have been studied by the mean of routine clinical biochemistry. Approximate reference levels have been published for 150 proteins,3 representing less than 1% of the putative plasma proteome. If some proteins are found in a disease and are suggested to serve as potential biomarkers, one should know the concentration range of such proteins in a healthy person to predict their applicability. In this study we have determined the concentration ranges of plasma proteins encoded by genes of Chr 18 in plasma samples of healthy volunteers. All measurements were performed using SRM and SIS added to each individual plasma sample. The results obtained allowed a measure of the number of proteins identified in each individual sample, protein diversity between the samples (proteome width), and protein contents (proteome depth5) of the detectable Chr 18 proteins in the plasma of healthy volunteers. Special Issue: Chromosome-Centric Human Proteome Project 2016 Received: May 4, 2016

A

DOI: 10.1021/acs.jproteome.6b00384 J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research



Internal Standard Production

EXPERIMENTAL PROCEDURES

The peptides desired were obtained using the solid-phase peptide synthesis on the Overture (Protein Technologies, USA) or Hamilton Microlab STAR devices according to the published method.11 The isotope-labeled leucine (Fmoc-Leu-OH13C6,15N), arginine (13C6,15N4), lysine (13C6,15N2), or serine (13C3,15N1) was used for isotope-labeled peptide synthesis instead of the unlabeled leucine, arginine, lysine, or serine, respectively. Concentrations of the synthesized peptides were measured by the method of amino acids analysis with fluorescent signal detection of amino acids derived after peptides acidic hydrolysis.12

Subjects

Fifty-four male subjects (age 20−47, average 26) were examined at the Institute of Medico-Biological Problems (Moscow, Russia) by the medical evaluation board which specializes in space biology and medicine.9 According to their health characteristics, all the persons were approved for space-related simulations and experiments. They were HIV and viral hepatitis B and C negative and had no previous history of cancer. Routine biochemical and blood parameters of these volunteers were measured using standard automatic analyzers. Most of the parameter values fitted the normal intervals known in clinical laboratory practice (Supplementary Table). All participants gave informed consent to participate in this study. Human-related procedures were performed according to the guidelines of the local ethical committees.

LC-SRM Analysis

The separation of peptides from digested plasma was carried out using the UPLC Agilent 1290 system including a pump and an autosampler. The sample was loaded into the analytical column Eclipse Plus SBC-18, 2.1 mm × 100 mm, 1.8 μm, 100 A. Peptide elution was performed by applying a mixture of solvents A and B. Solvent A was HPLC grade water with 0.1% (v/v) formic acid, and solvent B was 80% (v/v) HPLC grade acetonitrile/water with 0.1% (v/v) formic acid. The separations were performed by applying a linear gradient from 3% to 32% solvent B over 50 min, then from 32% to 53% solvent B over 3 min at 300 μL/min followed by a washing step (5 min at 90% solvent B) and an equilibration step (5 min at 3% solvent B). Ten microliters of each sample were applied on chromatographic column. The quantitative analysis was performed using an Agilent 6495 Triple Quadrupole instrument (Agilent, USA) equipped with the Jet Stream ionization source. The following parameters were used for the Agilent Jet Stream ionization source: temperature of the drying gas, 280 °C; pressure in the nebulizer, 18 psi; flow of the drying gas, 14 L/min; and the voltage on the capillary, 3000 V.

Sample Preparation for LC−SRM

Venous blood was collected from the volunteers to the EDTA Vacutainer plasma tubes (BD, the USA). The blood samples were processed according to the tube manufacturer’s instructions. The plasma supernatant was filtered through 0.22 μm cellulose−acetate filters (Whatman, NJ, USA) and stored at −80 °C. The plasma samples obtained were depleted using MARS (Multi-Affinity Removal System) Hu-14 column (10 mm × 100 mm) according to the manufacturer’s protocol (Agilent, USA). For each sample we used 40 μL of plasma as starting material for depletion; 40 μL of plasma were diluted four times with PBS and injected as two consequent runs. The collected fractions containing unbound proteins were desalted using cellulose−acetate 5K MWCO (Agilent, USA) spin columns and concentrated to 50 μL final volume. The protein concentration was determined using the Micro BCA protein assay (Thermo Scientific, Rockford, IL, USA). The plasma-containing aliquot (100 μg) of protein was supplemented with a lysis buffer (4% SDS in 0.1 M Tris-HCl pH 8.5) to a volume of 30 μL and placed on the Microcon YM-10 filters (Millipore, USA) for the tryptic digestion according to the FASP protocol.10 Briefly, the samples were supplemented with 100 μL of a reducing solution (0.1 M 1,4-dithiothreitol (DTT) in 0.1 M Tris-HCl pH 8.5), incubated for 40 min at 56 °C, and centrifuged at 10 000g for 15 min at 20 °C. The samples were then washed 2 times with 200 μL of solution of 8 M urea in 0.1 M Tris-HCl, pH 8.5, followed by centrifugation at 10 000g at 20 °C for 15 min. The process of carbamidomethylation was carried out in 100 μL of 50 mM iodoacetamide (Sigma-Aldrich) in 0.1 M Tris-HCl, pH 8.5, for 30 min at room temperature. The samples were then washed twice with 200 μL of solution of 8 M urea in 0.1 M Tris-HCl, pH 8.5, followed by centrifugation at 10 000g at 20 °C for 15 min and 3 times with a buffer for tryptic cleavage (50 mM tetraethylammonium bicarbonate pH 8.5). To carry out the enzymatic hydrolytic cleavage the sample was supplemented with 50 μL of the buffer for tryptic cleavage containing trypsin (20 ng/μL) (Promega; Madison, WI, USA). After 16 h of incubation at 37 °C, the samples were supplemented with 50 μL of 30% formic acid, containing all isotope-labeled standards at various concentrations and centrifuged at 10 000g for 10 min. Pass-through peptide fractions were dried down in a centrifugal vacuum concentrator (Eppendorf) dissolved in 100 μL of 3% (v/v) formic acid and used for further SRM analysis.

Data Processing

Targeted quantitative screening for 267 proteins encoded by chromosome 18 was performed in the digested plasma samples. Manual selection of the unique proteotypic peptides and the most intense transitions was performed on the basis of the SRM scouting of chromosome 18 results.6 A final list of peptides for SRM assay was complied according to the criteria described in the Supplementary Table. For each protein, one standard peptide with three transitions was used. The selected peptides were arranged into one SRM assay. The information (m/z of precursors, m/z of transition ions, CE values, b, y ions, transition ions, and retention time of the peptides) on SIS and the target peptides is given in Supplementary Table. Each SRM experiment was repeated in three technical runs. The results were manually inspected using the Skyline software13 to find transitions that were similar to those in the target peptides. For interference screening we applied the criteria described in Percy at al.14 Briefly, the peptide was considered to be detected in the run if the differences between relative intensities for three transitions of endogenous and isotopically labeled peptide did not exceed 25% in the run, and the transition chromatographic profiles of endogenous peptide were identical to the corresponding transitions of stable-isotope labeled peptide. Calibration curves were obtained for each of the detected peptides using the mixtures of purified synthetic native peptides in the concentration range of 10−8 to 10−13 M, and its isotopically labeled analogues were added at the concentration of 10−9 M to 10−12 M. All calibration curves were linear in the B

DOI: 10.1021/acs.jproteome.6b00384 J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research

Figure 1. Number of the proteins detected in each of 54 individual whole plasma samples of healthy individuals (A) and the total number of the identified proteins pooled from individual whole plasma samples (B).

range of 10−8 to 10−13 M and showed the coefficient of linear regression equal to 0.95. Prior to the sample processing, the performance of the LC−SRM platforms used was validated by obtaining the calibration curves of the corresponding set of SIS and synthetic natural peptides. After five LC−SRM runs we verified the relevance of calibration by analyzing one of the calibration peptide solutions at 10−10 M. The detection limit was defined as the lowest concentration determined on the linear part of the calibration curve. It varies for different peptides in the range from 10−13 M to 10−11 M. Labeled/unlabeled peptide peak area ratios were used to calculate the concentration of the targeted peptide in a sample.

Q3SY89, Q6ZTR6, Q8NG57, Q9BYG7) or lack of trypsin cleavage sites (Q9HC47). Rules of proteotypic peptide selection are presented in the Supplementary Table. Selection of the unique Chr 18 proteotypic peptides and the most intense transitions was performed on the basis of the SRM scouting results.6 More than a thousand peptides derived from tryptic digest of Chr 18 proteins were synthesized and tested for their applicability to the SRM assay. As a result, 267 stable isotopelabeled standard peptides as well as the corresponding nonlabeled proteotypic peptides were selected to perform the measurements. All experiments were performed using whole and depleted plasma samples (see the Experimental Procedures section). In total, we recorded 267 primary SRM traces for targeted peptide precursor pairs (natural and SIS) with 3 corresponding transitions for each pair. QC of the SRM of obtained data was performed using the criteria described in the Experimental Procedures section. First, the chromatographic profiles of an endogenous peptide must be identical to the corresponding stable-isotope labeled peptide. Second, the differences between relative intensities for three transitions of endogenous and isotopically labeled peptide should not exceed 25% in the run. Third, the concentration measured within technical runs should have a coefficient of variance (CV) < 20% (see recommendations for Tier 2 “research use assay” in Carr et al.15). The assay performance data are summarized in the Supplementary Table. As a result of QC, only 84 peptides passed

Cpept = C labSpept /S lab

where Cpept is the target peptide concentration, Clab is the labeled peptide concentration, Spept is the area of target peptide peak, and Slab is the area of labeled peptide peak.



RESULTS AND DISCUSSION

Primary Data Quality Control (QC)

Targeted quantitative screening was performed for 267 of 276 proteins encoded by Chr 18 in the plasma samples from 54 certified healthy male individuals. Nine proteins were excluded from the analysis due to the absence of appropriate proteotypic peptides (C9JCN9, J3KSC0, O14950, P19105, C

DOI: 10.1021/acs.jproteome.6b00384 J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research Table 1. Medical Relevance of Proteins Encoded by Chr 18 Based on Literature Survey

a

no.

Uniprot AC

1.

P02766a

transthyretin

protein name

disease

2. 3. 4. 5.

Q9C0F0 O15105 P38405a Q16787a

Putative Polycomb group protein ASXL3 Mothers against decapentaplegic homologue 7 Guanine nucleotide-binding protein G(olf) subunit alpha Laminin subunit alpha-3

6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19.

P49257a Q15583 Q8J025 Q86YT6a Q92539 Q9Y2V3 O95427 Q719H9 Q92750 Q9Y4W6a Q9HCE0 Q15796a Q5U5Q3a P19404

20.

O15165

21.

Q15532

Protein ERGIC-53 Homeobox protein TGIF1 Protein APCDD1 E3 ubiquitin-protein ligase MIB1 phosphatidate phosphatase LPIN2 retinal homeobox protein Rx PI ethanolamine phosphate transferase 1 BTB/POZ domain-containing protein KCTD1 transcription initiation factor TFIID subunit 4B AFG3-like protein 2 Ectopic P granules protein 5 homologue Mothers against decapentaplegic homologue 2 RNA-binding E3 ubiquitin-protein ligase MEX3C NADH dehydrogenase [ubiquinone] flavoprotein 2, mitochondrial low-density lipoprotein receptor class A domaincontaining protein 4 Protein SSXT

amyloidosis, transthyretin-related;26 hyperthyroxinemia, dystransthyretinemic;27 carpal tunnel syndrome28 Bainbridge−Ropers syndrome29 colorectal cancer17 dystonia 2530 epidermolysis bullosa, junctional, Herlitz type;31laryngoonychocutaneous syndrome32 Factor v and factor viii combined deficiency33 Holoprosencephaly4;34 esophageal carcinoma35 hypotrichosis 136 left ventricular noncompaction 737 Majeed syndrome38 microphthalmia,39 multiple congenital anomalies-hypotonia-seizures syndrome 140 scalp−ear−nipple syndrome41 spermatogenic failure 1342 spinocerebellar ataxia 28;43spastic ataxia 5,autosomal recessive;44 Vici syndrome45 colorectal cancer17 essential hypertension46 Parkinson disease,20 schizophrenia,18 bipolar disorder19 schizophrenia47 synovial sarcoma48

Proteins detected in this study.

transitions traces < 25%, or slight difference in retention time of one of the traces. Thus, there is a clear need to reevaluate such strict SRM criteria (at least in research). In the present project our objective was to detect as many Chr 18 proteins as possible in one multiplexed SRM assay. For the limited number of Chr18 proteins with high medical value, further optimization of the LC−SRM can be performed to detect the proteins in a more reliable manner. Medical relevance for some detected proteins is presented in Table 1. According to the survey and meta-analysis of published studies (Table 1), over 20 proteins have relevance to some diseases. For example, SMAD2 (Q15796) is the known members of TGF-β signaling pathway providing the growth of inhibitory signals in the normal intestinal epithelium. Mutations in these proteins or deletions of the protein coding chromosomal region 18q21 are associated with colorectal cancer.16,17 The gene encoding NADH dehydrogenase flavoprotein 2 (P19404) is considered as a candidate gene for several neuronal diseases, such as schizophrenia, bipolar disorder, and Parkinson’s disease.18−20

manual curation according to the above-mentioned criteria. Among them 56 and 59 proteins were quantified in whole and depleted plasma samples, respectively (see Supplementary Table). The SRM raw data are available in ProteomeXchange repository (PXD004374). Width (Diversity) of Chr 18 Encoded Proteome

First, the qualitative distribution of the target proteins in the plasma samples was estimated, that is, how many samples of the whole cohort contained detectable levels of each protein. The data on the number of the proteins detected in each individual sample of the plasma of healthy individuals are presented in Figure 1. Pooling the results of protein identification in plasma samples of healthy donors (n = 54) we detected 84 proteins in total (31% of proteins encoded by the Chr 18 genes). As can be seen from Figure 1A, the number of the detected proteins varies from 12 to 45 with the average number of the proteins identified across 54 samples equal to 30 ± 8 and 25 ± 15 for the whole and depleted plasma samples, respectively. Among the proteins that passed QC, only 2 proteins (P02766, transthyretin; and P22830, Ferrochelatase) were observed in all 54 samples (marked with bold font in Supplementary Table). Figure 1B shows a graph depicting the dependency of the number of proteins detected on the number of the analyzed specimens. The total number of the identified proteins (pooled from individual samples) linearly increased with the number of the samples. However, starting from the fifth sample, the list of detected proteins remained unchanged. In reality, we saw signals for the detected peptides almost from all samples. In our work we followed the criteria for SRM defined by Carr et al.15 and by Percy et al.14 In most of the cases of the “gaps” the tested peptides could not pass such criteria as CV < 20%, or the difference between relative intensities of the

Depth of Chr 18 Encoded Proteome

Quantitative estimations for 84 proteins were obtained by SRM in a dynamic range of 5 orders of magnitude from 10−6 M to 10−11 M using SIS added to whole and depleted plasma (Figure 2A). According to Figure 2 data, the protein concentrations decrease exponentially with the increase in the number of the proteins investigated with a steady-state concentration value close to the method detection limit. The maximum number of the proteins detected was made up by low abundant proteins with the median at 15 × 10−11 M. The least abundant protein, O43861 (P4-ATPase (a putative aminophospholipid translocase)), was detected in five samples, and its concentration was measured at 2.1 × 10−11 M in depleted plasma. According to Proteinatlas (http://www.proteinatlas.org/), D

DOI: 10.1021/acs.jproteome.6b00384 J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research

Figure 2. Quantitative distribution of 56 proteins detected in whole plasma (A) and 59 proteins detected in depleted plasma (B). Data expressed as median and 100, 75, 25, and 0 percentiles (for details see insert in panel A).

this protein is widely distributed across tissues. It can be detected from low to moderate expression levels in most tissues ([http://www.proteinatlas.org/]). P4-ATPase is involved in protein transport in the secretory pathways and can be secreted in blood as a component of exosomes.21 The most abundant protein in depleted plasma is Q06136 (3-ketodihydrosphingosine reductase). It is present at 0.7 × 10−9 M and 5 × 10−9 M concentrations in whole and depleted plasma, correspondently. Endoplasmic reticulum is a preferable subcellular localization for this protein. It is widely distributed across all tissues.

Several highly qualified works provided a basis for modern SRM methods designed for protein quantification in complex biological matrixes such as plasma. In general, quantitation of proteins from 5 to 6 orders of magnitude is accomplished in nondepleted plasma as shown in Percy et al.4 The paper demonstrates the ability to discover and quantify the potential biomarkers in concentrations ranging from 18 ng/mL (peroxiredoxin-2) up to 31 mg/mL (serum albumin). Chr18 encodes only transthyretin (Uniprot ID) among proteins quantified in the aforementioned papers. Its concentration (79 μg/mL in E

DOI: 10.1021/acs.jproteome.6b00384 J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research Percy et al.22 and 210 μg/mL in Hortin et al.3) is consistent with the data obtained in the present work (113 μg/mL).

by immunoaffinity removal of 14 highly abundant proteins using the MARS Hu-14 system allowed the identification of 28 proteins. On the other hand, we have identified 25 unique proteins in whole plasma. The absence of those 25 proteins from the list of proteins registered in depleted plasma is obviously a result of nonspecific adsorption under immunoaffinity removal of major proteins, such as serum albumin, antibodies, etc. In this context Yadav et al., demonstrated the coprecipitation of 101 proteins in the depletion system similar to that used in the present work (MARS Hu-14).24 Similar results were obtained in a proteomic study of the removable fraction.25 Despite a decrease in the dynamic range of concentrations under depletion, O43861 is the least abundant protein among the registered ones. This fact indicates that in our measurements the limiting factor of the SRM detection of proteins is the sensitivity of the system rather than the dynamic range of measured concentrations. We had a possibility here to estimate the depletion rate of these proteins. Figure 5 shows the data on changes of

Interindividual Variance in Plasma Levels of Chromosome 18 Proteins

Interindividual variance of target protein levels was estimated by calculation of the coefficient of variance (iCV). As it can be seen from Figure 3, the median iCV between studied

Figure 3. Interindividual coefficient of variance values distribution of all detected and passed QC proteins encoded by Chr18 in whole (black) and depleted (gray) plasma.

protein concentrations is about 40% and only three proteins (Q53F39,Q9Y5B0 and P29508) have high (>100%) variation. In the case of 59 proteins identified in 54 samples of depleted plasma, the median iCV value shifted down to 35% and no proteins had the variation higher than 100% (Figure 3). A number of low-copied proteins (