Gaussian Process Modeling of Protein Turnover - Journal of Proteome

May 27, 2016 - The resulting stochastic process is a Gaussian processes with Ornstein–Uhlenbeck covariance matrix. We applied the stochastic model t...
0 downloads 10 Views 540KB Size
Subscriber access provided by UCL Library Services

Article

Gaussian Process Modeling of Protein Turnover Mahbubur Rahman, Stephen F Previs, Takhar Kasumov, and Rovshan G. Sadygov J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.5b00990 • Publication Date (Web): 27 May 2016 Downloaded from http://pubs.acs.org on May 30, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Gaussian Process Modeling of Protein Turnover Mahbubur Rahman1,2, Stephen F. Previs3, Takhar Kasumov4,5, and Rovshan G. Sadygov1,2,* 1

Department of Biochemistry and Molecular Biology, 2Sealy Center for Molecular

Medicine, The University of Texas Medical Branch, Galveston, Texas 77555 3

Merck Research Laboratories 2015 Galloping Hill Road Kenilworth, NJ 07033

4

Department of Gastroenterology & Hepatology, Cleveland Clinic Cleveland, OH 44195 5

Department of Pharmaceutical Sciences

School of Pharmacy, Northeast Ohio Medical University Rootstown, OH 44225 *To whom correspondence should be addressed: Tel. 409-772-3287; Fax 409772-9670; email: [email protected] Keywords: dynamic proteome, protein degradation rate constant, protein turnover rate constant, Ornstein-Uhlenbeck process, Gaussian Process, stochastic differential equation for protein turnover rate constant, mass spectrometry, stable isotope labeling.

Abbreviations: GP, Gaussian Process; SSM, subsarcolemmal

mitochondria; LC-MS, liquid-

chromatography-mass spectrometry; ODE, ordinary differential equation; OU, OrnsteinUhlenbeck.

1 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Abstract

We describe a stochastic model to compute in vivo protein turnover rate constants from stable isotope labeling and high-throughput liquid-chromatography-mass spectrometry experiments. We show that the often used one- and two-compartment non-stochastic models allow explicit solutions from the corresponding stochastic differential equations. The resulting stochastic process is a Gaussian Processes with Ornstein-Uhlenbeck covariance matrix. We applied the stochastic model to a large scale data set from

15

N labeling and compared its performance metrics with that of the non-

stochastic curve fitting. The comparison showed that for more than 99% of proteins the stochastic model produced better fits to the experimental data (based on residual sum of squares). The model was used for extracting protein decay rate constants from mouse brain (slow turnover) and liver (fast turnover) samples. We found that the most affected (compared to two-exponent curve fitting) results were those for liver proteins. The ratio of the median of degradation rate constants of liver proteins to those of brain proteins increased by four-fold in stochastic modeling compared to the two-exponent fitting. Stochastic modeling predicted stronger differences of protein turnover processes between mouse liver and brain than previously estimated. The model is independent of the labeling isotope. To show this we also applied the model to a protein turnover studied in induced heart failure in rats where metabolic labeling was achieved by administering heavy water. No changes in the model were necessary for adapting to heavy water labeling. The approach has been implemented in a freely available R code.

2 ACS Paragon Plus Environment

Page 2 of 30

Page 3 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Introduction

The continuous degradation and synthesis of proteins plays an important role in maintaining cellular homeostasis1. The dynamic equilibrium of cellular protein abundances can change due to, for example, external stimuli, developmental programs, onset of diseases, or ageing. The new equilibrium is achieved by balancing between alterations in the rate of synthesis and/or the rate of degradation. The combined process of synthesis and degradation termed as protein turnover refers to the movement of a protein through its pool and is often expressed as a first-order rate constant2. It is thought that protein degradation is a stochastic process – all proteins (newly synthesized and old) have the same probability of being degraded2. Labeling of proteins with radioactive amino acids is considered to be a “golden standard” for studying proteome dynamics. In this, “pulse-chase” method, proteins of interest are radioactively labeled over a brief period of time (the pulse), and then the decay in radioactivity is measured over time (the chase) and is used to determine protein half-life3. This assay is accurate and minimally perturbative but requires a specific antibody for each protein which makes it difficult for scaling to multiple proteins. Fluorophore tagging of proteins have also been used for proteome dynamics studies4,5. While these methods are real-time, it has been argued that the tagging may affect protein turnover properties, both through changing the protein structure and by altering their abundance and stoichiometry relative to interaction partners6. Mass spectrometry based proteomics is a widely used, high-throughput technology to identify and quantify proteins7. Most of the quantitative proteomics

3 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 30

applications so far have been used for measuring the net change in protein abundance8. The technology has recently been enhanced and applied9 in studies aimed at

measuring the rate of protein turnover both in vivo10-15 and in vitro16-19. Protein

turnover

studies

using

metabolic

labeling

and

Liquid-Chromatography

Mass

Spectrometry (LC-MS) analyses have a complex design that includes a critical bioinformatics component to extract protein decay rate constants2. The work discussed herein focuses on the application of a stochastic model to extract protein turnover rate constants from in vivo protein turnover studies. Briefly, the metabolic incorporation of stable isotope and LC-MS measurements of time-course peptide/protein labeling provide data on protein kinetics. Traditionally, exponential decay curves have been used to fit the experimental data and infer the protein turnover rate constants11,18,20,21. Single- (single exponential) and multi-compartment (first order synthesis, and multi-exponential21,22) models have been employed. Time-dependent ordinary differential equations (ODE) have also been used to model the stable isotope incorporation in metabolic labeling22,23. However, the general applicability of these models to in vivo studies are limited, as they may result in negative decay rates (for multi-compartment systems) and are not able to explain fluctuations in the experimental data21. Stochastic models were recently applied to high-throughput time-course data in proteome studies24,25. SORAD24 used a Gaussian Process (GP) to describe time-course data of phosphor-proteins from enzyme-linked immunosorbent assay experiments. Similar models have been used for chromatographic alignment25, and beta-binomial process modeling of high throughput sequencing26. The stochastic processes add the

4 ACS Paragon Plus Environment

Page 5 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

flexibility of accounting for fluctuations in the data. Here we describe a non-parametric, data-driven stochastic approach - GP for modeling time-course data from metabolic labeling experiments. We show that the GP with the Ornstein-Uhlenbeck (OU) kernel is a natural solution to the protein turnover kinetics in one- or two-compartment models. This approach produces fits that are highly improved as compared to the traditional curve fitting models, it overcomes problems with negative decay rates and random fluctuations in decay rates. Last, the model is isotope independent (i.e., labeling).

The

R

code

implementing

the

model

is

15

N or 2H

available

at

https://ispace.utmb.edu/users/rgsadygo/Proteomics/ProteinKinetics.

Methods Protein turnover studies using metabolic labeling, mass spectrometry and bioinformatics have a complex design and comprise multiple stages27. There are some variations dependent on what type of labeling (e.g.

15

N, 2H,

13

C) is used and how a

tracer is administered. For studies that rely on heavy water labeling, for example, the animals are provided heavy water using a “primed-infusion” logic. For example, animals are given an initial loading bolus in order to instantaneously raise the enrichment of body water to a desired level. They are simultaneously given labeled water in the drinking bottles to maintain the enrichment. At pre-determined time points, the animals are sacrificed and the organs or tissues are collected. LC-MS/MS is used to identify proteins and measure the level of incorporation of heavy isotopes28-30. The time-course data of the heavy isotope incorporations is then modeled to extract the protein turnover rate constant, Figure 1. The last stage is the focus of this study. In the following text, we

5 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

will first review the current models and then describe our stochastic model to extract protein turnover rate constants from heavy isotope labeling data. Ordinary differential equations have previously been used to model the incorporation of the heavy isotope labeled species in metabolic labeling and LC-MS21,23. As a prototypical study, Guan and colleagues21 have used one- and two-compartment models to determine protein turnover rate constants from experiments using 15N-labeled algae diet (produced from

15

N-enriched NaNO3) and mass spectrometry. In brief, the

one-compartment model assumes that proteins are synthesized at a constant rate ksyn and degraded proportional to their abundance levels with the degradation rate constant of kdeg. With the additional assumption of the steady-state for a protein concentration, it was shown that the fraction of the heavy isotope labeled proteins, β(t) (characterized by the relative isotope fraction, RIF), is described by a first order deterministic equation:  = k  − k  βt βt (1) where t is the time, and the initial value condition is: β(0) = 0(the baseline, before the tracer administration). The solution of this equation is a growth curve with a single exponent: βt = 1 − e The details of all formula derivations are provided in the Supplementary Information. We note that under the steady-state assumption (which is used in this work), protein degradation rate constant is equal to the fractional protein synthesis rate (ratio of the synthesis rate over the protein concentration). Thus, throughout the paper, we refer to the protein turnover rate constant and degradation rate constant interchangeably.

6 ACS Paragon Plus Environment

Page 6 of 30

Page 7 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Several other publications11,12,18,20,31 modeling protein turnover from stable isotope labeling experiments have used equations similar to Eq. (1). However, when applied to the data sets from mouse protein turnover studies that used

15

N labeling, the fit to the

data was often complicated (in particular for fast protein turnover)21. Better results were achieved by using a two-compartment model. In this model, the first compartment considers the kinetics of amino acids, while the second compartment accounts for the amino acids incorporation into proteins. Protein synthesis is assumed to be a first order process. It is modeled to depend on the concentrations of heavy isotope labeled amino acids (precursor pool). The model leads to a system of equations for the fraction of heavy isotope labeled amino acids, α(t), and β(t): 

 = k  − k  αt αt  = k  αt − k  βt βt

(2) where ksyn is the rate constant of protein synthesis from the precursor amino acids. Eq. (2) was derived under the assumption of the steady-state for total (labeled and unlabeled) protein level. The amino acids are synthesized with the constant rate, k0, and degraded proportional to their abundances with the rate constant, ksyn. Protein degradation is characterized with the rate constant, kdeg. The equation for α(t) is solved first, with the initial value for stable isotope labeled amino acids, α(0) set equal to zero: αt = 1 − e which leads to the following equation for the β(t):  = k  1 − e  − k  βt βt (3)

7 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The solution of Eq. (3) (β(0) = 0) is the two-exponent growth curve: βt = 1 − k  e − k  e /k  − k   The solutions for α(t) and β(t) were obtained with the assumption that the heavy isotope labeling is asymptotically complete. This assumption, while somewhat reasonable for incorporation of

15

N-labeled amino acids into proteins, will not be generalizable for

incorporation of deuteriums during administration of heavy water. This is because, in contrast to fully

15

N-labeled diet, the metabolic labeling with heavy water involves

administration of only ~5% 2H2O in the drinking water. Metabolic labeling with 100% 15N in the diet may lead to the complete labeling of all nitrogen atoms in a protein, but using 5% heavy water will only result in partial labeling of the proteins. The incomplete labeling was also observed6 in labeling by SILAC32. It was observed that unless the cell medium is regularly renewed by labeled amino acids, complete labeling of proteins will not be achieved. In animal experiments it is not feasible to renew the cellular environment. To account for the incomplete labeling, we introduced a new parameter, heavy isotope enrichment fraction at the asymptote, λ. With this parameter, the solution for β(t) becomes: βt = λ − λk  e − k  e /k  − k   (4) A more descriptive text on the one- and two-compartment models and their solutions is provided in the Supporting Information section. Even though, the two-compartment model produced significantly improved fits to the experimental data, it was observed that the model was not able to account for the stochastic fluctuations in the

8 ACS Paragon Plus Environment

Page 8 of 30

Page 9 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

experimental data21. It has subsequently lead to high synthesis rate constants in the two-compartment model and/or negative decay rate constants (for the expanded, threecompartment model). Here, we will show that stochastic differential equations based on either Eqs. (1) or (2) allow exact solution in the form of a GP with OU kernel. The stochastic equation uses a model fluctuation parameter that allows one to account for and estimate the effects of stochastic fluctuations in experimental data. The stochastic analog of Eq. (3) for β(t) (with the inclusion of the incomplete isotope labeling at the asymptote) is: dXt = k  λ1 − e  − Xtdt + Γk  , k  , t, X, λ, σ' dBt; X0 = 0 (5) where X(t) is a stochastic variable denoting the fraction of the heavy isotope enriched protein (β(t) in the deterministic model), Γk  , k  , t, X, λ, σ'  is the volatility which, in general is dependent, on the model parameters ksyn, kdeg, λ, X(t), and the variance of the model fluctuations, σγ; B(t) is the Brownian motion (White Gaussian noise). If we assume that in addition to the model fluctuations there are also measurement errors that are described by a White Gaussian noise, ε (with standard deviation σε), the final model is: Yt = Xt + ε; ε~N0, σ0/ . The model depends on the stochastic variable, X(t), which is the solution to Eq. (5). We assume a homoscedastic variance which allows for an explicit solution to Eq. (5). Thus if the volatility, Γ, depends only on the model fluctuations, σγ, the Eq. (5) becomes: dXt = k  λ1 − e  − Xtdt + σ' dBt (6) 9 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 30

Eq. (6) can be solved (using the assumption of the Itô’s process33,34 for X(t) and the transformation g(X,t) = X(t)*exp(kdegt). The details of the derivation are provided in Supplementary Information section). The solution of Eq. (6) is: 

Xt = λ − λk  e − k  e /k  − k   + σ' 23 e dBs (7) Note that the solution, (7), is the OU process34,35 with a mean value, β(t), in Eq. (4), and covariance matrix, K(t,s) (also called Gram matrix), expressed as: Ks, t = σ0' 6 || /k 

where t and s are time points. The result allows direct application of the OU process to the proteome turnover dynamics. Since the solution is exact, it is not only methodologically preferable but also offers practical advantages to the Gaussian kernel model that was previously used for the first order equation24, this stems from the fact that for the exact solution there is one less parameter. The scaling factor in the Gaussian kernel is an independent parameter that needs to be estimated from the data. In the exact solution, the scaling parameter in OU kernel is the degradation rate constant, kdeg. In addition, it is natural that for the processes that behave exponentially, Eq. (7), the covariance terms between the time points will also be exponential (depending on the absolute value of the time difference), rather than Gaussian. Thus, our final model for the heavy isotope labeled protein proportions becomes: 89, μ 8Y9~MVNμ 89, Σ; Y 89 ∈ R ; 8 where MVN stands for multivariate normal distribution, n is the number of data points (time points at which heavy isotope levels have been measured), µ is equal to β which is determined in (4), and the covariance matrix, ∑ is an n by n matrix defined as: 10 ACS Paragon Plus Environment

Page 11 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

∑s, t = Ks, t + σ0/ δs, t (9) where δ is the Kronecker’s delta. The model (8) with the covariance matrix (9) is a GP with OU kernel and numerical methods exist36,37 for its solution to obtain kdeg and ksyn. The approaches to extract the parameters from the data use the posterior distributions or likelihood function and find parameter values that maximize them. In total there are five hyperparameters in our model, θ = k  , k  , σ' , σ/ , λ. The log-likelihood function is: logℒθ = −0.5 logdetΣ − 0.5y 89 − μ 89J Σ K y 89 − μ 89 − 0.5nlog2π

(10)

The posterior distribution, pθ|y, of the parameters is proportional to the product of the

likelihood function and the prior distributions, πθ. The log of the posterior distribution of the hyperparameters is: logpθ|y ∝ logℒθ + log πθ The hyperparameters can be estimated from the data either by

maximizing the

posterior distribution with respect to the hyperparameters36, or by sampling from the posterior distribution with Markov chain Monte Carlo methods25. We used a quasiNewton approach, Broyden–Fletcher–Goldfarb–Shanno algorithm38 available in R39, to maximize the posterior distribution function and obtain the hyperparameters. For the prior distributions (except for λ) we chose exponential distributions with the prior decay rates estimated from the non-stochastic model. A uniform prior was chosen for the asymptotic enrichment level, λ. Note that the derivative of the log-likelihood function, Eq. (10), with respect to the every hyperparameter is in a closed form and is used in the

11 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 30

optimizations. The corresponding formulas are shown in the Supplementary Information section. The approach is implemented in an R39 code and is available in the Supplementary Materials.

Results Performance metrics using 15N labeling data. The protein turnover experiments using

15

N labeling were used to test and compare

the performance of the stochastic model with the results from exponential curve fitting. The 15N labeling data is freely available on-line21. The data consists of two parts, a brain and a liver proteome. For every protein, RIF values of all peptides (for the protein) have been averaged to obtain protein RIF. We used these values to test our model. On average, protein turnover rates in brain are longer than those in the liver. For the nonstochastic model, fitting of liver data has been more difficult and required three exponential curves, some of which produced negative rate constants21. As a goodness of fit metric we chose the residual sum of squares (RSS) for each model: QRR = S



yT − yUT 0

TVK

where WX are the observed values and WUX are the corresponding theoretical predictions. In Figure 2, we show the comparison of the RSS values between the stochastic modeling (x axis) and the two-exponent curve fitting (y-axis) for mouse liver proteome (the red line is the line of unity). For the data points above the line of unity, RSSs of the two-exponent curves are larger than the corresponding RSSs from GP. The fits obtained by the GPs resulted in smaller RSS’s for 794 out of 797 proteins. Unlike the 12 ACS Paragon Plus Environment

Page 13 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

exponential curve fitting, GP allows for correlations between the measured RIFs in the time-course data. It is a desirable feature as the data points that are nearby in the time domain are expected to be correlated and ideally a model will be able to account for the correlation. In the case of the GP, the time correlations are provided in the structure of the Gram matrix. The diagonal elements of the matrix are the sum of the white noise variance and amplitude of the OU kernel. The non-diagonal elements of the matrix are the covariance between the different time points. The corresponding figure for brain proteome is shown in the Supplementary Materials Figure S1. Another goodness-of-fit measure in time course models is the Pearson correlation between the predicted values and experimental observations along the time axis. In Figures S2 (mouse liver) and S3 (mouse brain) of the Supplementary Materials Section, we show the scatter plot of the Pearson correlation coefficients for each protein obtained using predictions from twoexponent fitting and GP modeling. Similar to RSS, GP showed substantially improved correlation coefficients compared to the two-exponent fitting. To explicitly compare the fits from the two models we have chosen a liver mitochondrial protein, trifunctional protein subunit α (Q8BMS1). The experimental data (empty circles) and fits (green line – GP, blue line – exponential curves) are shown in Figure 3. The GP results in a better fit (approximately, 70 times smaller RSS value compared to the two-compartment exponential curve fit). The ksyn rate constant computed in the exponential curve fit is very large, in the order of 1012 (day-1), while the GP produced 1.4 day-1. The degradation rate constant, kdeg, in GP was 0.15 day-1 which was twice higher than the corresponding value from the two-exponent curve fit. The 95% confidence interval of the GP predicted fit is shown in Figure S4.

13 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 30

Comparison with the ODE results using 15N labeling data. The scatter plot of the computed decay rate constants using a GP model and twoexponent curve fitting for mouse liver proteins is shown in Figure 4. The relevant data for mouse brain are shown in Supplementary Materials Figure S5. In general, the GP model produced decay rate constants that are larger than those from the two-exponent curve fitting. The correlation between the rate constants was 0.81. The density plot of the differences between kdeg values produced by the GP and two-exponent curve fit are shown in Supplementary Figure S6. The distribution peaks at about 0.18 day-1 and extends up to 1 day-1. For liver proteins, the median degradation rate constants were 0. 091 day-1 and 0.228 day-1 for two-exponent curve fit and GP, respectively. Thus, GP modeling, on average, predicts more than two fold faster degradation rate constants for liver proteins. For the mouse brain proteins the effects were not so drastic. The median rate constants for brain proteins were 0.040 day-1 and 0.054 day-1 in two-exponent curve fit and GP modeling, respectively. It has been noted previously that the exponent curve fitting models were less accurate for modeling faster protein turnover (liver proteins)21. The ratio of the median turnover rate constants of liver proteins to brain proteins increased from 2.25 in two-exponent fitting to 4.23 in GP modeling. Thus the GP modeling predicts a larger difference between the protein turnovers in liver and brain. The ratio of the medians of turnover rate constants predicted by the GP modeling compares well to earlier studies that estimated the rate of incorporation of [14C]tyrosine in mouse liver (2.8% per hour) and brain (0.6% per hour)40. Note that in these studies the calculation of the rate constants are based on the tracer incorporation rate and the steady-state assumption, i.e. the fractional synthesis rate of proteins are equal to the fractional degradation rate. Under the steady-state assumption, the degradation rate 14 ACS Paragon Plus Environment

Page 15 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

constant is linearly dependent on the synthesis rate as can be seen from the onecompartment model (solution of Eq. (1b) in Supplementary materials). Figure S7 summarizes decay rate constants of brain and liver proteomes computed using twoexponent curve fitting and GP modeling. Stronger differences were observed in cases where one estimates the synthesis rate constants, ksyn, Figure 5. Unlike the two-exponent curve fit, there are no ksyn values by the GP that are larger than 10 day-1. For the two-exponent curve fitting ksyn values as large as 1014 day-1 have been produced. The median values of ksyn for two-exponent curve and GP were 9.12*109 day-1 and 1 day-1, respectively. We note that in the labeling experiments where the first isotope distribution pattern is measured after about nine hours of metabolic labeling, accurate determination of the rate constants of this much higher magnitude are unrealistic. The GP on the other hand yielded a maximum synthesis rate constant less than 10 day-1. In addition, as it is shown in Figures 2 and S1-S3, the goodness of fit is substantially improved for the GP. The synthesis and degradation rate constants as computed by GP and two-exponent curve fitting are provided in supplementary materials Tables S1 and S2 for liver and brain proteins, respectively. The stochastic model produces a volatility standard deviation, σγ, for each protein rate constant estimate. The higher volatility standard deviation means a stronger random fluctuation effects. In Figure S8 of Supplementary Materials we show the distribution of model standard deviations for the liver proteins. The mode of the standard deviation is non-zero at 0.018, the highest values are at 0.08.

15 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 30

As it was mentioned above one way to improve the model fitting to the data has been to add more compartments. However, we believe that one should consider the development of a rationale solution and interpretation, e.g. make sure the modeling is done properly (as Guan et al21 have shown) but that it is supported by some strong physiological underpinnings. It would appear that the addition of more compartments solved one problem (i.e. gave a better fit of the data) but failed to address what these compartments would represent from a biochemical perspective. Our approach simply considers another way to align the biology and the modeling. We suspect that the approach described herein provides a better fit than previous ODE methods, in part, because of instability in the 15N-precursor precursor labeling (note that a major assumption is a constant precursor labeling). For example, if the amino acid enrichment is constant one would expect that the use of an ODE method should produce a good fit of the apparent protein decay rate constant. Presumably the use of 15

N-labeling (via the diet) results in pulsatile (or oscillatory) changes in

proteogenic amino acids. One would expect a maximum

15

N-labeling of

15

N-enrichment immediately

following consumption of the food and a lower degree of labeling at times when food is not consumed (since during those periods

14

N would be appearing via endogenous

protein degradation). In addition, one needs to consider that there are gradients between plasma amino acid labeling and the labeling of free amino acids in different tissues. Effectively, the “free amino acid” compartment is further compartmentalized and the “mixing” between different tissues and variable plasma labeling is not necessarily equal across sites. We suspect that this is what led Guan and colleagues21 to use models of increasing complexity when fitting the tissue-specific protein labeling. The

16 ACS Paragon Plus Environment

Page 17 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

observations would be expected to play a greater role in cases where protein turnover is faster (e.g. liver vs brain) and therein require more complex models (e.g. three- vs twocompartment) to get a better fit.

Kinetics of heart mitochondrial Carnitine Palmitoyltransferase 1 In a recent study41 we used heavy water labeling to determine the changes in halflives of mitochondrial proteins in cardiomyocytes due to the induced heart failure in rats. One of the proteins that we studied was carnitine palmitoyltransferase (CPT) 1. The experimental data (crosses – normal heart, triangles – induced heart failure) and corresponding GP fits (black line normal heart, blue line – induced heart failure) are shown in Figure 6 (the data is for the CPT 1 from subsarcolemmal mitochondria (SSM) of cardiomyocytes). The computed turnover rate constants for the normal and induced heart failure samples were practically the same and equal to 0.09 day-1. The result is in agreement with the finding that turnover rates for CPT 1 in SSM were not affected by the induced heart failure. The ksyn rate constant was substantially larger in heart failure induced SSM CPT 1 than in the protein of the normal heart: 2.2 day-1 vs. 0.09 day-1.

Conclusions and Future Work We have applied a GP modeling to extract protein turnover rates from time-course data of isotope incorporation in stable isotope labeling and LC-MS experiments. It was shown that an exact solution to the stochastic differential equation yields a GP solution with OU kernel. The approach is preferable to the more widely used GP with Gaussian kernel, as the OU solution is exact, and the number of parameters is less. We then used

17 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 30

this approach to analyze a large scale data set obtained from mice fed a diet containing 15

N-labeled algae. The results have demonstrated that GP modeling substantially

improves the fit to the experimental data, particularly for protein turnover in liver, which in a previous model produced negative rate constants. We also applied the model to carnitine palmitoyltransferase of SSM of cardiomyocytes that were studied to infer effects of the induced heart failure. This experiment had used heavy water labeling. In future studies we plan to extend the model to include relative isotope fractions for each peptide of a protein. A similar approach has employed a mixed-effects modeling for growth curves resulting from the Gompertz model42. Using each peptide data separately rather than averaging peptide data will allow to estimate the variability due to the heavy isotope labeling of peptides. In addition, we will extend the model to describe the turnover of proteins that are not in a steady-state condition43.

Acknowledgements

Research reported in this publication was supported by the National Institute Of General Medical Sciences of the National Institutes of Health under Award Number R01GM112044.

Supporting Information Available: Supplementary materials include “Derivation of the GP Model from the Stochastic Differential Equation of Protein Turnover Kinetics”; Diagram D1 − one- and two-compartment models of protein turnover; Figure S1 − pairwise scatter plot of the RSS for mouse brain proteins; Figures S2 and S3 − the

18 ACS Paragon Plus Environment

Page 19 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

pairwise scatter plots of the Pearson correlations obtained from GP and two-exponent curve fitting for mouse liver and brain proteins, respectively; Figure S4 − 95% confidence interval of the mean of GP model for mitochondrial trifunctional protein, subunit α (Q8BMS1); Figure S5 − the scatter plot of the decay rates constants obtained from the GP and two-exponent curve fitting for mouse brain proteins; Figure S6 − the density of the difference between degradation rate constants computed by the GP and two-exponent curve fitting; Figure S7 – the boxplots of decay rate constants computed by two-exponent curve fitting and GP model for mouse brain and liver proteomes; Figure S8 –the density of standard deviations of the model distributions for mouse liver proteins; and Tables S1 and S2 – synthesis and degradation rate constants as computed by GP and two-exponent curve fit for mouse liver and brain proteins, respectively.

19 ACS Paragon Plus Environment

Journal of Proteome Research

Figures

Peptide isotopic distribution quantification

100 100 Relative Abundance

80 80 60 40 20

2

M0 M0 M1

M1

M0 M1

60

40

Proteins kinetics calculation

M1 M0

M2

M0

M2

M2 M3 M4

M3

M3 M3

20

Stochastic Simulations

M1 M 2

M2

M3 M4

M4

M4

M4 M5

M5

M5

Relative Isotope Fraction

Proteomics raw data

Relative Abundance

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 30

1

0

-1

0

0 20

24

28 32 Time (min)

36

40

44

0 day

3 days

10 days

20 days

0

30 days

10

20

30

Time, days

Figure 1.

Figure. 1. Modeling of protein turnover rate constants from stable-isotope labeling experiments. The time-course of heavy isotope labeling, extracted from the full scan spectra of tryptic peptides, is modeled as a GP with OU kernel.

20 ACS Paragon Plus Environment

0.3 0.2 0.1 0.0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

RSS, Exponential Curves

Page 21 of 30

0.00

0.02

0.04

0.06

0.08

0.10

RSS, Gaussian Process

Figure 2. Comparison of the residual sum of squares obtained by using exponential curve fitting44 (y-axis) and GP (x-axis). The red line is the line of unity. The shown data is for the liver proteome.

21 ACS Paragon Plus Environment

0.6 0.4 0.0

0.2

RIF

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 30

0.8

Journal of Proteome Research

0

5

10

15

20

25

30

Time (day)

Figure 3. Experimental (empty circles) data and model fits (green line – GP, blue line – exponential curve fitting) for mitochondrial trifunctional protein, subunit α (Q8BMS1). The GP produced a fit with RSS about 70 times smaller than that from the two-exponent fit.

22 ACS Paragon Plus Environment

0.1

0.2

0.3

−1

Degradation Rate Constants (Two-exponent fit), day

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

0.4

Page 23 of 30

0.0

0.2

0.4

0.6

0.8

1.0

Degradation Rate Constants (Gaussian Process), day

1.2 −1

Figure 4. A scatter plot of the decay rates constants as computed by the GP (x-axis) and two-exponent curve fitting (y-axis). In general the rate constants computed by the GP model were larger (mouse liver proteins).

23 ACS Paragon Plus Environment

1e+14

2e+14

3e+14

4e+14

5e+14

6e+14

Page 24 of 30

0e+00

Synthesis Rate Constants (Two-exponent fit), day

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

−1

Journal of Proteome Research

0

2

4

6

8

Synthesis Rate Constants (Gaussian Process), day

10 −1

Figure 5. Synthesis rate constants as computed using two-exponent curves (y-axis) and GP (x-axis) for mouse liver data proteome21.

24 ACS Paragon Plus Environment

0.8 0.6 0.4 0.0

0.2

Relative Isotope Fraction

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

1.0

Page 25 of 30

0

20

40

60

80

Time (day)

Figure 6. A GP modeling of RIF for CPT protein from normal (experimental data is shown with blue triangles and the fit in blue line) and induced heart failure (the experimental data is shown with black crosses and the fit in black line) interfebrullar mitochondria41.

25 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 30

References

(1) Hinkson, I. V.; Elias, J. E. The dynamic state of protein turnover: It's about time. Trends Cell Biol 2011, 21, 293-303. (2) Claydon, A. J.; Beynon, R. Proteome dynamics: revisiting turnover with a global perspective. Mol. Cell Proteomics 2012, 11, 1551-1565. (3) Schimke, R. T.; Doyle, D. Control of enzyme levels in animal tissues. Annu Rev Biochem 1970, 39, 929-976. (4) Yen, H. C.; Xu, Q.; Chou, D. M.; Zhao, Z.; Elledge, S. J. Global protein stability profiling in mammalian cells. Science 2008, 322, 918-923. (5) Eden, E.; Geva-Zatorsky, N.; Issaeva, I.; Cohen, A.; Dekel, E.; Danon, T.; Cohen, L.; Mayo, A.; Alon, U. Proteome half-life dynamics in living human cells. Science 2011, 331, 764-768. (6) Boisvert, F. M.; Ahmad, Y.; Gierlinski, M.; Charriere, F.; Lamont, D.; Scott, M.; Barton, G.; Lamond, A. I. A quantitative spatial proteomics analysis of proteome turnover in human cells. Mol Cell Proteomics 2012, 11, M111 011429. (7) Zhang, Y.; Fonslow, B. R.; Shan, B.; Baek, M. C.; Yates, J. R., III. Protein analysis by shotgun/bottom-up proteomics. Chem. Rev 2013, 113, 2343-2394. (8) Bantscheff, M.; Lemeer, S.; Savitski, M. M.; Kuster, B. Quantitative mass spectrometry in proteomics: critical review update from 2007 to the present. Anal. Bioanal. Chem 2012, 404, 939-965. (9) Li, Q. Advances in protein turnover analysis at the global level and biological insights. Mass Spectrom. Rev 2010, 29, 717-736. (10) McClatchy, D. B.; Dong, M. Q.; Wu, C. C.; Venable, J. D.; Yates, J. R., III. 15N metabolic labeling of mammalian tissue with slow protein turnover. J. Proteome. Res 2007, 6, 2005-2010. (11) Kim, T. Y.; Wang, D.; Kim, A. K.; Lau, E.; Lin, A. J.; Liem, D. A.; Zhang, J.; Zong, N. C.; Lam, M. P.; Ping, P. Metabolic labeling reveals proteome dynamics of mouse mitochondria. Mol. Cell Proteomics 2012, 11, 1586-1594. (12) Price, J. C.; Guan, S.; Burlingame, A.; Prusiner, S. B.; Ghaemmaghami, S. Analysis of proteome dynamics in the mouse brain. Proc. Natl. Acad. Sci. U. S. A 2010, 107, 14508-14513. (13) Claydon, A. J.; Thom, M. D.; Hurst, J. L.; Beynon, R. J. Protein turnover: measurement of proteome dynamics by whole animal metabolic labelling with stable isotope labelled amino acids. Proteomics 2012, 12, 1194-1206. (14) Wu, C. C.; MacCoss, M. J.; Howell, K. E.; Matthews, D. E.; Yates, J. R., III. Metabolic labeling of mammalian organisms with stable isotopes for quantitative proteomic analysis. Anal. Chem 2004, 76, 4951-4959. (15) Savas, J. N.; Toyama, B. H.; Xu, T.; Yates, J. R., 3rd; Hetzer, M. W. Extremely long-lived nuclear pore proteins in the rat brain. Science 2012, 335, 942. (16) Schwanhausser, B.; Busse, D.; Li, N.; Dittmar, G.; Schuchhardt, J.; Wolf, J.; Chen, W.; Selbach, M. Global quantification of mammalian gene expression control. Nature 2011, 473, 337-342.

26 ACS Paragon Plus Environment

Page 27 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

(17) Boisvert, F. M.; Ahmad, Y.; Gierlinski, M.; Charriere, F.; Lamont, D.; Scott, M.; Barton, G.; Lamond, A. I. A quantitative spatial proteomics analysis of proteome turnover in human cells. Mol. Cell Proteomics 2012, 11, M111. (18) Hsieh, E. J.; Shulman, N. J.; Dai, D. F.; Vincow, E. S.; Karunadharma, P. P.; Pallanck, L.; Rabinovitch, P. S.; MacCoss, M. J. Topograph, a software platform for precursor enrichment corrected global protein turnover measurements. Mol. Cell Proteomics 2012, 11, 1468-1474. (19) Nelson, C. J.; Li, L.; Jacoby, R. P.; Millar, A. H. Degradation rate of mitochondrial proteins in Arabidopsis thaliana cells. J Proteome Res 2013, 12, 34493459. (20) Zhang, Y.; Reckow, S.; Webhofer, C.; Boehme, M.; Gormanns, P.; EggeJacobsen, W. M.; Turck, C. W. Proteome scale turnover analysis in live animals using stable isotope metabolic labeling. Anal. Chem 2011, 83, 1665-1672. (21) Guan, S.; Price, J. C.; Ghaemmaghami, S.; Prusiner, S. B.; Burlingame, A. L. Compartment modeling for mammalian protein turnover studies by stable isotope metabolic labeling. Anal. Chem 2012, 84, 4014-4021. (22) Ramakrishnan, R.; Ramakrishnan, J. D. Using mass measurements in tracer studies--a systematic approach to efficient modeling. Metabolism 2008, 57, 1078-1087. (23) Brunengraber, D. Z.; McCabe, B. J.; Kasumov, T.; Alexander, J. C.; Chandramouli, V.; Previs, S. F. Influence of diet on the modeling of adipose tissue triglycerides during growth. Am J Physiol Endocrinol Metab 2003, 285, E917-925. (24) Aijo, T.; Granberg, K.; Lahdesmaki, H. Sorad: a systems biology approach to predict and modulate dynamic signaling pathway response from phosphoproteome time-course measurements. Bioinformatics 2013, 29, 1283-1291. (25) Tsai, T. H.; Tadesse, M. G.; Di, P. C.; Pannell, L. K.; Mechref, Y.; Wang, Y.; Ressom, H. W. Multi-profile Bayesian alignment model for LC-MS data analysis with integration of internal standards. Bioinformatics 2013, 29, 2774-2780. (26) Topa, H.; Jonas, A.; Kofler, R.; Kosiol, C.; Honkela, A. Gaussian process test for high-throughput sequencing time series: application to experimental evolution. Bioinformatics 2015, 31, 1762-1770. (27) Shekar, K. C.; Li, L.; Dabkowski, E. R.; Xu, W.; Ribeiro, R. F., Jr.; Hecker, P. A.; Recchia, F. A.; Sadygov, R. G.; Willard, B.; Kasumov, T.; Stanley, W. C. Cardiac mitochondrial proteome dynamics with heavy water reveals stable rate of mitochondrial protein synthesis in heart failure despite decline in mitochondrial oxidative capacity. J Mol Cell Cardiol 2014, 75, 88-97. (28) Rachdaoui, N.; Austin, L.; Kramer, E.; Previs, M. J.; Anderson, V. E.; Kasumov, T.; Previs, S. F. Measuring proteome dynamics in vivo: as easy as adding water? Mol. Cell Proteomics 2009, 8, 2653-2663. (29) Kasumov, T.; Ilchenko, S.; Li, L.; Rachdaoui, N.; Sadygov, R. G.; Willard, B.; McCullough, A. J.; Previs, S. Measuring protein synthesis using metabolic (2)H labeling, high-resolution mass spectrometry, and an algorithm. Anal. Biochem 2011, 412, 47-55. (30) Busch, R.; Kim, Y. K.; Neese, R. A.; Schade-Serin, V.; Collins, M.; Awada, M.; Gardner, J. L.; Beysen, C.; Marino, M. E.; Misell, L. M.; Hellerstein, M. K. Measurement of protein turnover rates by heavy water labeling of nonessential amino acids. Biochim. Biophys. Acta 2006, 1760, 730-744.

27 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 30

(31) Cambridge, S. B.; Gnad, F.; Nguyen, C.; Bermejo, J. L.; Kruger, M.; Mann, M. Systems-wide proteomic analysis in mammalian cells reveals conserved, functional protein turnover. J Proteome Res 2011, 10, 5275-5284. (32) Ong, S. E.; Blagoev, B.; Kratchmarova, I.; Kristensen, D. B.; Steen, H.; Pandey, A.; Mann, M. Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol. Cell Proteomics 2002, 1, 376386. (33) ¢ksendal, B. K.: Stochastic differential equations : an introduction with applications; 6th ed.; Springer: Berlin ; New York, 2003. (34) Iacus, S. M.: Simulation and inference for stochastic differential equations with r examples. In Springer series in statistics; Springer,: New York, N. Y., 2008. (35) Huynh-Thu, V. A.; Sanguinetti, G. Combining tree-based and dynamical systems for the inference of gene regulatory networks. Bioinformatics 2015, 31, 16141622. (36) Zurauskiene, J.; Kirk, P.; Thorne, T.; Pinney, J.; Stumpf, M. Derivative processes for modelling metabolic fluxes. Bioinformatics 2014, 30, 1892-1898. (37) Neal, R. M.: Bayesian learning for neural networks; Springer: New York, 1996. (38) Byrd, R. H.; Lu, P. H.; Nocedal, J.; Zhu, C. Y. A Limited Memory Algorithm for Bound Constrained Optimization. Siam J Sci Comput 1995, 16, 1190-1208. (39) Team, R. D. C.: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing: Vienna, Austria, 2010. (40) Lajtha, A.; Latzkovits, L.; Toth, J. Comparison of turnover rates of proteins of the brain, liver and kidney in mouse in vivo following long term labeling. Biochim Biophys Acta 1976, 425, 511-520. (41) Kasumov, T.; Dabkowski, E. R.; Shekar, K. C.; Li, L.; Ribeiro, R. F., Jr.; Walsh, K.; Previs, S. F.; Sadygov, R. G.; Willard, B.; Stanley, W. C. Assessment of cardiac proteome dynamics with heavy water: slower protein synthesis rates in interfibrillar than subsarcolemmal mitochondria. Am. J. Physiol Heart Circ. Physiol 2013, 304, H1201H1214. (42) Donnet, S.; Foulley, J. L.; Samson, A. Bayesian analysis of growth curves using mixed models defined by stochastic differential equations. Biometrics 2010, 66, 733741. (43) Tuvdendorj, D.; Chinkes, D. L.; Herndon, D. N.; Zhang, X. J.; Wolfe, R. R. A novel stable isotope tracer method to measure muscle protein fractional breakdown rate during a physiological non-steady-state condition. Am. J. Physiol Endocrinol. Metab 2013, 304, E623-E630. (44) Guan, S.; Price, J. C.; Prusiner, S. B.; Ghaemmaghami, S.; Burlingame, A. L. A data processing pipeline for mammalian proteome dynamics studies using stable isotope metabolic labeling. Mol. Cell Proteomics 2011, 10, M111.

28 ACS Paragon Plus Environment

Page 29 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Table of Contents:

Synopsis: We describe a stochastic modeling of protein turnover rate constants from stable isotope labeling and LC-MS. We show that the often used one- and twocompartment non-stochastic models allow an explicit solution from the corresponding stochastic differential equations. The resulting stochastic process is a Gaussian Process with Ornstein-Uhlenbeck covariance matrix. We applied the model to large scale data sets from

15

N labeling and compared the performance metrics from the non-

stochastic curve fitting with the corresponding result from the appropriate stochastic model. The results showed that for more than 99% of proteins the stochastic model produces a better fit to the experimental data. The stochastic modeling predicted stronger differences in the protein turnover processes in the mouse liver and brain than previously estimated.

29 ACS Paragon Plus Environment

Journal of Proteome Research

0.86

0.90

0.94

0.98

For TOC only:

Pearson Correlations (Two-Exponent Fit)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 30

0.96

0.97

0.98

0.99

1.00

Pearson Correlations (Gaussian Process)

The pairwise scatter plot of the Pearson correlations obtained from protein turnover models using exponential curve fitting (y-axis) and GP (x-axis). The red line is the line of unity.

30 ACS Paragon Plus Environment