Intact-Mass Analysis Facilitates the Identification of Large Human

Aug 8, 2019 - Our recently developed two-dimensional top-down proteomics platform coupling serial size exclusion chromatography (sSEC) to ...
0 downloads 0 Views 605KB Size
Subscriber access provided by Nottingham Trent University

Letter

Intact-Mass Analysis Facilitates the Identification of Large Human Heart Proteoforms Leah V. Schaffer, Trisha Tucholski, Michael R. Shortreed, Ying Ge, and Lloyd M. Smith Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.9b02343 • Publication Date (Web): 08 Aug 2019 Downloaded from pubs.acs.org on August 11, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

1

Intact-Mass Analysis Facilitates the Identification of Large Human Heart Proteoforms Leah V. Schaffer1,§, Trisha Tucholski1,§, Michael R. Shortreed1, Ying Ge1,2,3,*, Lloyd M. Smith1,*

1 2

Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706, USA Department of Cell and Regenerative Biology, University of Wisconsin-Madison,

Madison, WI 53705, USA 3

Human Proteomics Program, University of Wisconsin-Madison, Madison, WI 53705,

USA

§

LVS and TT contributed equally

* To whom correspondence should be addressed: Dr. Lloyd Smith, Chemistry 4209A, 1101 University Ave., Madison, WI 53706. Email: [email protected]. Tel: 608-263-2594. Fax: 608-265-6780. Dr. Ying Ge, WIMR II 8551, 1111 Highland Ave., Madison, WI 53705. Email: [email protected]. Tel: 608-265-4744. Fax: 608-265-8745.

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 20

2

TOC

KEYWORDS top-down proteomics; proteoform; proteoform family; large proteoforms; heart

ACS Paragon Plus Environment

Page 3 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

3

ABSTRACT Proteoforms, the primary effectors of biological processes, are the different forms of proteins that arise from molecular processing events such as alternative splicing and post-translational modifications. Heart diseases exhibit changes in proteoform levels, motivating the development of a deeper understanding of the heart proteoform landscape. Our recently developed two-dimensional top-down proteomics platform coupling serial size exclusion chromatography (sSEC) to reversed-phase chromatography (RPC) expanded coverage of the human heart proteome and allowed observation of highmolecular weight proteoforms. However, most of these observed proteoforms were not identified due to the difficulty in obtaining quality tandem mass spectrometry (MS2) fragmentation data for large proteoforms from complex biological mixtures on a chromatographic timescale. Herein, we sought to identify human heart proteoforms in this dataset using an enhanced version of Proteoform Suite, which identifies proteoforms by intact mass alone. Specifically, we added a new feature to Proteoform Suite to determine candidate identifications for isotopically-unresolved proteoforms larger than 50 kDa, enabling subsequent MS2 identification of important high-molecular weight human heart proteoforms such as lamin A (72 kDa) and trifunctional enzyme subunit α (79 kDa). With this new workflow for large proteoform identification, endogenous human cardiac myosin binding protein C (140 kDa) was identified for the first time. This study demonstrates the integration of the sSEC-RPC- MS proteomics platform with intact-mass analysis through Proteoform Suite to create a catalog of human heart proteoforms and facilitate the identification of large proteoforms in complex systems.

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 20

4

Proteoforms are the different forms of proteins, which result from sources such as genetic variation, RNA editing and splicing, and post-translational modifications (PTMs).1 A proteoform family consists of the different proteoforms from a single gene.2 The identification and quantification of proteoforms are important to understanding biological systems because different proteoforms exhibit distinct biological functions.3-7 Altered proteoforms have been observed in heart disease, the leading cause of death worldwide, suggesting the importance of proteoform analysis to understanding cardiac dysfunction. 4, 8-9

Mass spectrometry-based top-down proteomics is the most powerful available tool

for identification and quantification of proteoforms. In a typical top-down proteomic analysis, precursor mass spectra (MS1) of intact proteoforms are acquired, the most abundant peaks are selected for fragmentation, and tandem mass spectra (MS2) of the fragment ions are acquired.10 Top-down proteomic analysis of the human proteome is technically challenging due to its wide dynamic range and high complexity. Analysis of high-molecular weight (MW) proteoforms is particularly difficult because there is an exponential MS signal-to-noise ratio (S/N) decrease with increasing MW.11 Signal suppression due to the co-elution of highly abundant low-MW proteoforms imparts even more challenges for the analysis of high-MW proteoforms from complex mixtures. Thus, size-based separations are critical to observing and identifying high-MW proteoforms. The Ge laboratory recently developed serial size exclusion chromatography (sSEC), which utilizes MS-compatible solvents for high resolution size-based fractionation of complex protein mixtures.12-13 sSEC fractionation was combined with online reversed-phase chromatography (RPC) in a two-dimensional (2D) separation platform and top-down proteomic analysis of human heart tissue lysate was performed

ACS Paragon Plus Environment

Page 5 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

5

on a quadrupole-time-of-flight (Q-TOF) mass spectrometer.12 Our 2D sSEC-RPC analysis resulted in the detection of 5360 proteoforms, with 47 proteoforms larger than 60 kDa. In this previous study, mostly MS1 data was acquired in order to profile the intact human heart proteome and determine high-MW candidates for subsequent targeted MS2 analysis, and targeted broadband MS2 data was acquired for a select set of high-MW proteoforms. Despite MS2 availability for 18 unique masses over 30 kDa, only two proteoforms were identified, creatine kinase (43 kDa) and trifunctional enzyme subunit beta (47 kDa), using the top-down search algorithm, MS-Align+. It is worth noting that these two proteoforms were isotopically resolved by the Q-TOF mass spectrometer, so the monoisotopic mass could be used for database searching. No proteoforms larger than 50 kDa were identified in the previous study, despite availability of acquired targeted MS2 data12. It is a recognized challenge in online LC-MS/MS top-down proteomics to obtain sufficient fragmentation ions for identification of proteins larger than 50 kDa.10, 14 Indeed, a number of issues challenged successful top-down proteomic analysis of larger proteoforms. Though sSEC-RPC separation allowed for the MS1 detection of larger heart proteoforms, proteoforms larger than 50 kDa were not isotopically resolved by the Q-TOF mass analyzer, which prevented the use of monoisotopic mass in top-down search algorithms. Furthermore, MS2 of larger proteoforms on an LC-MS timescale often provides sparse fragmentation and low S/N for fragment ions. Co-isolation of more than one charge state of a parent ion during the targeted MS2 acquisition was used to improve the fragmentation efficiency and fragment ion S/N.15 However, this strategy also increased the MS2 spectral complexity because these spectra included fragment ions from multiple, co-eluting proteoforms. Confident identification of large proteoforms

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 20

6

typically also requires manual validation of the fragment ions from the MS2 spectra due to the complexity of the data. However, this process requires a candidate sequence to query. These challenges combined prevented identification of the proteoforms >50 kDa in the original study. The Smith laboratory has recently developed the freely available and open-source software program Proteoform Suite (https://smith-chem-wisc.github.io/ProteoformSuite/), which uses MS1 intact-mass measurements to identify proteoforms by comparing the observed experimental proteoform masses to theoretical masses derived from a database and to co-eluting experimental proteoform masses.16-17 Proteoform Suite has been used to analyze proteomic datasets derived from S. cerevisiae16-17, E.coli18, and murine mitochondria19 on an Orbitrap mass spectrometer. Although MS1 measurements do not provide PTM localization and the false discovery rate is typically higher than for MS2 analyses, Proteoform Suite enables MS1 intact-mass identification of proteoforms that were either not selected for or unable to be identified by MS2 analysis.17,

19

For

deconvolution results, Proteoform Suite can input any .tsv file with mass, intensity, and retention time columns, making it versatile for any instrument or deconvolution method of choice. We used Proteoform Suite here to identify human heart proteoforms in the topdown proteomic dataset described above. This study, which integrated MS-compatible size-based fractionation and intact-mass analysis, identified 409 proteoforms 50 kDa were not isotopically-resolved on the Q-TOF mass spectrometer; therefore, deconvolution could not provide the monoisotopic mass. Instead, the mass at the apex of the charge-state deconvoluted spectrum was reported, which is close to the average mass of the proteoform.20 The average mass for each proteoform in the theoretical database was determined from the chemical formula. We enabled Proteoform Suite to identify candidates for high-MW proteoforms by implementing a notch search21 against the theoretical database using the average mass with a 2 Da mass tolerance. The 2 Da tolerance was chosen to be wide enough to capture the apex of the observed unresolved isotopic envelopes, while rejecting most theoretical proteoform matches.

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 20

8

In the analysis of proteoforms 50 kDa analysis was quite large, 82.7%, due to the wide 2 Da search tolerance, which results in a high chance of matching with a decoy proteoform mass. Given this high FDR, we do not consider these experimental proteoforms to have been identified, but rather to be candidates for subsequent analysis of the available targeted MS2 data. The proteoform candidates determined in this manner were used to guide MS2 data analysis in order to identify proteoforms. For example, a co-eluting group of 65 kDa and 72 kDa proteoforms (Figure 1A) had previously been targeted by broadband coisolation and collisionally activated dissociation but were not able to be identified with a typical database search in MS-Align+, likely due to the use of average precursor mass in place of monoisotopic mass and the MS2 spectral complexity due to daughter ions originating from multiple parent ions. By intact-mass analysis, Proteoform Suite determined the theoretical candidate for the group of 72 kDa proteoforms to be acetylated lamin A (gene LMNA), including both mono- and bis- phosphorylated proteoforms. Proteoforms in the LMNA gene family are intermediate filament proteins, which make up

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 20

10

the nuclear lamina.22 Since changes to LMNA phosphorylation have been linked to cardiomyopathies, it is important that we can detect and quantify these specific proteoforms in the heart.22-23 We used MASH Suite Pro to query the MS2 fragments against the LMNA sequence determined by Proteoform Suite, and we confirmed the identity of the acetylated lamin A proteoform with both N-terminal and C-terminal sequence tags that were manually validated in the raw data, as previously described24 (Figure 1B). The lamin A identifications led us to hypothesize that the three co-eluting 65 kDa proteoforms were three proteoforms from the lamin C isoform from gene LMNA: acetylated,

acetylated

and

mono-phosphorylated,

and

acetylated

and

bis-

phosphorylated. Likewise, we used the same MS2 spectrum to confirm the identification of the lamin C isoform with a C-terminal sequence tag (Figure 1C, Supporting Figure S-6 ). Lamin C had not been identified as a candidate for the 65 kDa proteoforms in Proteoform Suite because it was absent in the theoretical database downloaded from UniProt, which contained only canonical protein sequences. This study demonstrates how intact-mass analysis in Proteoform Suite can guide MS2 analysis to enable the identification of isotopically-unresolved proteoforms >50 kDa. A full list of validated MS2 fragment ions for lamin A and lamin C can be found in Supporting Tables S-4.1 - S-4.4. Proteoform Suite determined that two observed 79 kDa proteoforms closely matched the unmodified and succinylated trifunctional enzyme subunit α (gene HADHA). Similarly, we used a previously acquired targeted MS2 spectrum to confirm this candidate identification (Supporting Figure S-7B and S-7D). Proteoform Suite also identified trifunctional enzyme subunit β in the 50 kDa proteoform identifications if a manually-validated sequence tag was obtained. Analysis in Proteoform Suite is more automated and amenable to large-scale analyses, but intact-mass analysis does not provide identifications with as much confidence as MS2 data, particularly in isotopically unresolved data and in complex systems such as human heart tissue.

12

However, as identified proteoforms are

increasingly catalogued in repositories such as the Consortium for Top-Down Proteomics Proteoform Atlas (http://atlas.topdownproteomics.org/), the ability to automatically match intact masses with candidate theoretical proteoforms will facilitate both qualitative and quantitative proteoform analyses. We also note a major benefit of intact-mass analysis and proteoform family construction is its potential use for guided targeted MS2 data acquisition. If an intact mass is matched to a candidate proteoform with biologically

ACS Paragon Plus Environment

Page 13 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

13

interesting PTMs or from a gene of interest, subsequent targeted MS2 analysis can be performed on the sample. Additionally, the experimental-experimental comparison can reveal unidentified proteoform families of interest (e.g., those with multiple phosphorylation events), that can be followed-up with targeted MS2 analysis. CONCLUSIONS We used the open-source and freely available software program Proteoform Suite to identify proteoforms by intact mass in a dataset of serial size exclusion chromatography-separated fractions of human heart tissue lysate. Proteoform Suite identified 409 unique proteoforms 50 kDa to be observed in a previous top-down study, the high-MW candidates which were targeted for MS2 in that work were left unidentified by a conventional top-down search algorithm due to either excessive spectral complexity or inefficient fragmentation. Proteoform Suite determined theoretical candidates for many of these proteoforms, some of which were subsequently confirmed with previously acquired targeted MS2 data. The integration of the MS-compatible sSEC proteomics platform with intact-mass analysis through Proteoform Suite enabled identification of many important human heart proteoforms and proteoform families. Proteoform Suite can identify proteoform candidates for subsequent targeted MS2 analysis and proteoform quantification, making it a valuable tool for top-down proteomics.

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 20

14

FIGURES

Figure 1. A) Charge-state deconvoluted MS1 spectra for lamin A and lamin C isoforms of LMNA (fraction 5, 28-29.3 min). Proteoform Suite identified three candidate proteoforms from the lamin A isoform: with acetylation alone, with both acetylation and mono-phosphorylation, and with both acetylation and bis-phosphorylation. Three additional co-eluting masses were also observed in the same retention window. Data from two MS2 experiments (15 eV and 20 eV CAD energies) were combined to generate

ACS Paragon Plus Environment

Page 15 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

15

sequence tables shown in panels B) and C). B) Lamin A isoform sequence with matching MS2 fragments and highlighted C-terminal sequence tag (pink). Zoom-in of a representative MS2 spectrum (15 eV CAD) from 910-970 m/z shows y-ions corresponding to the lamin A C-terminal sequence tag. C) Lamin C isoform sequence with matching MS2 fragments and highlighted C-terminal sequence tag (blue). Zoom-in of a representative MS2 spectrum (15 eV CAD) from 870-910 m/z shows y-ions corresponding to the lamin C C-terminal sequence tag. Additional zoom-in of the MS2 spectrum for the highlighted fragment ions are found in Supporting Figure S-6.

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 20

16

Figure 2. A) Original and charge-state deconvoluted MS1 spectra for the cMyBP-C proteoforms (fraction 4, RT 33-35 min). B) MS2 spectrum acquired by co-isolating all charge states in the 700-800 m/z range over the protein elution window (CAD energy 18 eV). C) Sequence for cMyBP-C with methionine cleaved with matching b- and y- fragment ions and highlighted (green) N-terminal sequence tag. Zoom-in of the MS2 spectrum between 520-650 m/z shows b-ions corresponding to the highlighted N-terminal sequence tag (green) and other ions in the zoom-in spectrum that match to the proteoform sequence (grey).

SUPPORTING INFORMATION. Experimental Methods; Proteoform Suite Identifications 50 kDa;: Supporting Figure S-2: Summary of Proteoforms and Proteoform Families; Supporting Figure S-3: Human heart proteoform families