pytc: open source python software for global ... - ACS Publications

the software implements four such models: a blank titration, a single-site binding curve,1 an. N-site binding polynomial,10 and a 1:1 competitor model...
2 downloads 9 Views 3MB Size
Subscriber access provided by Warwick University Library

pytc: open source python software for global analyses of isothermal titration calorimetry data Hiranmayi Duvvuri, Lucas C Wheeler, and Michael Harms Biochemistry, Just Accepted Manuscript • DOI: 10.1021/acs.biochem.7b01264 • Publication Date (Web): 05 Apr 2018 Downloaded from http://pubs.acs.org on April 8, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

pytc: open source python software for global analyses of isothermal titration calorimetry data Hiranmayi Duvvuri,‡,§ Lucas C. Wheeler,‡,§ and Michael J. Harms∗,‡,§ ‡Department of Chemistry and Biochemistry, University of Oregon, Eugene, OR, USA §Institute of Molecular Biology, University of Oregon, Eugene, OR, USA E-mail: [email protected] Phone: 541-346-9002

Running header pytc: open source python software for global analyses of isothermal titration calorimetry data

1

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Abstract Here we describe pytc, an open-source Python-package for global fits of thermodynamic models to multiple Isothermal Titration Calorimetry experiments. Key features include simplicity, the ability to implement new thermodynamic models, a robust maximum likelihood fitter, a fast Bayesian Markov-Chain Monte Carlo sampler, rigorous implementation, extensive documentation, and full cross-platform compatibility. pytc fitting can be done using either an application program interface or via a graphical user interface. It is available for download at: https://github.com/harmslab/pytc.

Introduction Isothermal Titration Calorimetry (ITC) is a powerful technique for measuring the thermodynamics of intermolecular interactions. 1,2 It provides information about the free energy, enthalpy, entropy, and stoichiometry of binding. Combining information from ITC experiments under different conditions can reveal further properties, such as heat capacity change or the number of protons gained or lost during binding. 3–5 Extracting thermodynamic information requires fitting models to observed heats, potentially across multiple experiments. Several software packages exist for fitting these models. These include commercial software that ships with instruments. These packages are closedsource and cannot be easily extended by users who wish to analyze different or more complex models. A powerful alternative to this approach is SEDPHAT 6 (and its recent fork ITCsy 7 ). These packages allow for global fitting of exceedingly complex models. 6–8 While these packages are powerful, it remains difficult for users to extend these packages with new fitting models. They also require proprietary software: either Windows (SEDPHAT) or Matlab (ITCsy). Here we report pytc, a flexible Python-based package for fitting ITC data. It is designed for ease of use for simple fitting scenarios, but also has ready extensibility and flexibility for more complex fitting. It has a graphical user interface (GUI) for basic fitting, as well as a 2

ACS Paragon Plus Environment

Page 2 of 21

Page 3 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

straightforward object-oriented application programming interface (API). The API allows users to write new models and integrate with other analysis pipelines. Because this is a project that will undergo constant development, we strongly recommend that readers visit the project website (https://github.com/harmslab/pytc) and up-todate documentation (https://pytc.readthedocs.io) before using pytc. The program is completely open source and can be modified and redistributed at will, as it has been released under the Unlicense. • Download: https://github.com/harmslab/pytc • Software and model documentation: https://pytc.readthedocs.io/ • Tutorial and programming examples: https://github.com/harmslab/pytc-demos • Video tutorials: https://pytcgui.readthedocs.io/en/latest/how_to_img.html

Package Description The following provides a quick sketch of the features of pytc (as of version 1.2.2). The software documentation contains much more detailed descriptions of how to use the software, information regarding the thermodynamic models, and notes about the statistical tools available in pytc (https://pytc.readthedocs.io/en/latest/). The basic pytc work flow is similar when using either the API or GUI. The user provides files that hold the heat-per-injection for each experiment, determined by integrating raw power curves using the instrument software or an alternative such as NITPIC. 9 The user then specifies an individual fit model for each experimental data set. The current version of the software implements four such models: a blank titration, a single-site binding curve, 1 an N -site binding polynomial, 10 and a 1:1 competitor model. 11 Before fitting, the user can set which parameters can vary, as well as parameter bounds. These individual experiments can then be linked together into global fits. In the simplest case, one can link individual fit parameters to global parameters. For example, a user can fit a 3

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

single binding constant and enthalpy to multiple experimental replicates. More complicated linkages can also be defined. The current release of the software implements three such linkages: proton-linked binding versus buffer ionization enthalpy, 3 a van ’t Hoff analysis assuming a fixed enthalpy change, and a van ’t Hoff analysis assuming fixed heat capacity change (reviewed in 5 ). For these models, individual experiments are done as a function of buffer ionization enthalpy or temperature, which then allows calculation of the enthalpy and binding constant of each individual experiment under its specific conditions. pytc has two basic analysis modes: maximum likelihood (ML) and Bayesian MarkovChain Monte Carlo (MCMC). ML uses least squares regression to find the model parameters that minimize the difference between the model and the experimental observations. In contrast, MCMC yields a distribution of parameter values, with values weighted by their likelihood. The two methods reflect different statistical philosophies (frequentist versus Bayesian), as well as different practical considerations. ML analyses are much faster than MCMC analyses. On the other hand, ML can lead to biased parameter estimates and can be highly sensitive to initial parameter guesses. MCMC is much slower—as it requires extensive sampling of a potentially large parameter space—but also integrates over uncertainty in the fit parameters. The philosophical and practical differences between these overall approaches have been discussed extensively in the literature. 12–14 We have implemented both approaches and leave the user to decide which is more appropriate given their particular scientific question. The ML method uses the extensively tested least squares routine implemented in scipy (scipy.optimize.least_squares). 15 This routine uses the powerful "Trust Region Reflective" algorithm, 16 which is designed to robustly allow parameter convergence even with sparse data. The 95% confidence interval for each fit parameter can be estimated using the diagonal of the covariance matrix or by bootstrap sampling from the experimental uncertainty in each measured heat, following standard statistical methods. 14 The MCMC method uses the fast emcee library. 17 This package works by creating a

4

ACS Paragon Plus Environment

Page 4 of 21

Page 5 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

large number of "walkers" that explore the parameter space, in parallel. Walkers spend more time in regions of parameter space that fit the experimental data well, and less time in regions of parameter space that fit the experimental data poorly. The package keeps track of the trajectories of each walker and then uses these trajectories to create distributions of parameter estimates. (In more precise language, emcee samples the likelihood surface using multiple Markov chains, and then uses these samples and uniform priors to construct joint posterior probability distributions for the model parameters). For each parameter, pytc calculates the 95% credibility interval. This is analogous to a 95% confidence interval estimated using a ML method. 12 pytc returns a wide variety of statistical information useful for assessing model quality. This statistical information falls into two broad classes. The first class allows assessment of overall fit quality. These include fit R2 , log likelihood, and fit residuals. For example, when comparing two different fits with an equal number of parameters, the model with the highest R2 and highest log likelihood should usually be selected. Additionally, fit residuals are graphed below each fit, allowing the user to visually assess whether the residuals are non-random, which would indicate the model does not adequately describe the data. The second class of of quality assessment statistics allows comparison between two models that have different numbers of parameters. Models with more parameters will generally fit the data better than models with fewer parameters. These extra parameters may or may not be meaningful. A standard approach in model fitting is to choose the simplest model consistent with the data. A variety of statistics can be used to balance fitting the data well against the addition of many parameters. pytc returns four test statistics that penalize models based on the number of free parameters: Akaike Information, corrected Akaike Information, Bayesian Information, and the F-statistic. 14,18 Additionally, pytc implements a function that uses these test statistics to compare two or more model fits, thus helping the user select the appropriate model. Finally, one powerful aspect of using the pytc API is increased fit reproducibility. The

5

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 21

values returned by a fit can depend on the operations taken during a fit. In a GUI, however, the operations are not recorded—only the end result. For example, when fitting a model in Origin, one might enter fit guesses, perform an initial optimization by Newton’s method, and then follow up with a final optimization by Levenberg-Marquardt. If the fit does not converge, the user may try holding one parameter constant, and then allow the parameter to vary on later steps. This open-ended and nondeterministic procedure means that different users may end up with different fit results, thus undermining reproducibility. In contrast, if one writes out the fit as a short python program, the exact steps taken are explicitly recorded and can be reproduced by anyone who has the script.

Models pytc implements three basic binding models: a single-site binding model, 1 an N -site binding polynomial, 10 and a 1:1 competitor model. 11 It is relatively straightforward to add new models, as discussed in the next section. pytc is implemented with the philosophy that one should model all processes in the ITC experiment and include them in the fit. These processes include not only the binding reaction of interest, but also systematic errors in sample preparation and changes in heat arising from dilution. The latter effects are captured with nuisance parameters added to the standard thermodynamic model:

Qi,obs = Qi × α + (Dslope × [L]total + Dintercept ) + ε

(1)

Qi,obs is the observed heat at injection i and Qi is the heat at injection i arising from the binding reaction of interest. pytc models Qi using standard thermodynamic models. 1,10,11 Qi,obs differs from Qi because of systematic concentration errors (α) and the heat arising from dilution (Dslope and Dintercept ). ε is the fit residual. α scales all heats, accounting for systematic errors in the relative protein or ligand con6

ACS Paragon Plus Environment

Page 7 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

centration. This could also be viewed as an apparent stoichiometry but, following Freire et al, 10 we interpret this as "the effective amount of active protein relative to the nominal value entered as protein concentration." This term is referred to as the "fraction competent" in pytc output. Practically, this value should be near 1.0. Large deviations from 1.0 may indicate that the model used does not describe the stoichiometry of the protein or that there is a problem with the experiment. The other nuisance parameters, Dslope and Dintercept , model the heat of dilution as a linear function of titrant concentration. This is equivalent to fitting a straight line through a blank titration and then subtracting that line from an experimental titration. Rather than subtracting a blank, pytc allows a user to incorporate a blank titration into the global fit: the user fits the same Dslope and Dintercept across both an experimental and blank titration, thus explicitly modeling the heat of dilution. If a one sets the values of α to 1.0, Dslope to 0.0, and Dintercept to 0.0, and then does not allow them to vary during the fit, one recovers only the thermodynamic model—as is done in software such as Origin.

API Design Python is an excellent language for a new ITC analysis package. It has a large and vibrant scientific programming community, meaning that routines for numerical analyses, regression, and plotting are tested by thousands of users and continually improved. Further, Python is the language of choice for a large number of graduate courses introducing chemists and biochemists to scientific programming. Because pytc is written in a language that is familiar to many students, the barrier to modifying and extending the package is lower than it would be for other programming languages. pytc uses the basic python scientific computing stack: numpy, 19,20 scipy, 15 and matplotlib. 21 The GUI also requires PyQT5. The API is simple and does not require extensive programming background to use. For example, the code below shows a script that will fit a single-site binding model to an ITC 7

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

data set. In general, the user creates a GlobalFit object, adds experiments to that object, and then performs the fit. import pytc

# i n i t i a l i z e the f i t t i n g object g = pytc . G l o b a l F i t ( )

# l o a d i n t h e e x p e r i m e n t a l h e a t s and i n d i c a t e t h e y s h o u l d be # d e s c r i b e d w i t h a s i n g l e −s i t e b i n d i n g model a = pytc . ITCExperiment ( " ca−edta / t r i s −01.DH" , pytc . indiv_models . S i n g l e S i t e , s h o t _ s t a r t =2)

# Add t h e e x p e r i m e n t t o t h e f i t t e r g . add_experiment ( a )

# Perform f i t and r e t u r n r e s u l t s g . f i t () f i g , ax = g . p l o t ( ) print ( g . f i t _ a s _ c s v ) One of the key features of pytc is the ability to implement new binding models. This can be done for both models of individual ITC experiments and for global models. Implementing new models is straightforward and described in detail in the documentation. The coding required can be quite minimal. The single-site binding model included with the package is defined in 13 lines of python code. The global model linking buffer ionization enthalpy to proton gain/loss is defined in six lines. To illustrate the simplicity, the model linking ionization enthalpy to proton gain and loss is copied in full below: 8

ACS Paragon Plus Environment

Page 8 of 21

Page 9 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

import pytc

# Define a c l a s s d e s c r i b i n g changes in observed enthalpy # due t o c h a n g e s i n p r o t o n a t i o n and b u f f e r i o n i z a t i o n e n t h a l p y c l a s s NumProtons ( pytc . g l o b a l _ c o n n e c t o r s . GlobalConnector ) :

# Define r e a l i s t i c guesses f o r the g l o b a l f i t parameters param_guesses = {"num_H" : 0 . 0 , " d H _ i n t r i n s i c " : 0 . 0 }

# D e f i n e i n f o r m a t i o n t h a t must be s p e c i f i e d by t h e u s e r # when t h e y i n v o k e t h e c l a s s required_data = [ " ionization_enthalpy " ]

# D e f i n e how t h e o b s e r v e d e n t h a l p y f o r an e x p e r i m e n t r e l a t e s to # i t s i o n i z a t i o n e n t h a l p y , number o f p r o t o n s , and b i n d i n g enthalpy def dH( s e l f , e x p e r i m e n t ) : dHion = e x p e r i m e n t . i o n i z a t i o n _ e n t h a l p y return s e l f . d H _ i n t r i n s i c + s e l f .num_H∗ dHion Importantly, the requirements for new models are specific and well-defined. A complete description of how to implement new models is available at https://pytc.readthedocs. io/en/latest/writing_new_models.html.

9

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Demonstration In the following four sections, we demonstrate the use of pytc for four different fitting scenarios.

Fitting single-site models We first set out to validate our software against existing commerical software for a simple, single-site binding isotherm for Ca2+ binding to EDT A. We compared fits using pytc to fits using Origin 7.0552, which was included with our VP-ITC (GE Healthcare). We titrated 1.6 mM CaCl2 onto 0.1 mM EDT A in 100 mM T ris, pH 7.40 at 25 ◦ C using a VP-ITC (Figure 1A). All buffers were treated with CHELEX (≈ 1 g · L−1 , stirring for 30 minutes), filtered at 0.22 µm, and then thoroughly degassed. Injection volume was 1 µL for the first injection, followed by 5 µL for all following injections. Reference power was 7 µcal · s−1 ; stir speed was 633. For the Origin fit, we corrected for dilution by fitting a line to a blank titration—identical to the production titration except for having no EDTA—and then subtracting this line from the production heats. We discarded the first two injections before fitting. We then fit the “single-site” model to these results (Figure 1B). We obtained KD = 24.9 ± 0.7 nM and ∆H ◦ = −11.6 ± 0.1 kcal · mol−1 . We then repeated this analysis using pytc. We read the raw experimental and blank heats, uncorrected for dilution, and then globally fit pytc’s blank and single-site model to the two experiments using the maxiumum likelihood fitter. We used pytc’s built in bootstrap method to estimate parameter uncertainty by creating 1,000 psuedoreplicate datasets sampled from the heat for each injection with a standard deviation of 0.1 kcal · mol−1 . We obtained results identical, within uncertainty, to those from Origin: KD = 24.7 ± 0.1 nM and ∆H ◦ = −11.7 ± 0.1 kcal · mol−1 (Figure 1C). We repeated this analysis for 8 more Ca2+ : EDT A binding experiments in three different buffers (HEPES, imidazole, and Tris). In all cases, we obtained identical results within error to those obtained by Origin (Figure 1D).

10

ACS Paragon Plus Environment

Page 10 of 21

Page 11 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Figure 1: A) Raw power curve for the binding of Ca2+ to EDT A (25 mM T ris, 100 mM N aCl, pH 7.4, 25 ◦ C). B) Fit of a single-site binding model to integrated, blanked heats using Origin 7.0552. C) Fit of the same model with pytc 1.2.2. Dilution is accounted for in the fit model and then constrained by globally fitting the experiment (black) and blank (gray) titrations. D) Comparison of binding enthalpies (blue) and free energies (red) for Ca2+ : EDT A interactions extracted by Origin and pytc. Plot shows 9 experiments—with three replicates each—in Tris, HEPES, and imidazole.

11

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Global fit to extract number of protons released Global fits to multiple experiments are a powerful alternative to fitting models to individual experiments and then analyzing those fit parameters. 6,8,22 We validated the global ◦ ) fitting capability of pytc by estimating the buffer-independent binding enthalpy (∆Hindep

and the number of protons released or taken up on binding (nH ) for the Ca2+ : EDT A interaction at pH 7.4. We measured binding in three buffers with different ionization enthalpies: 4.875 kcal · mol−1 (HEPES), 8.757 kcal · mol−1 (imidazole), and 11.341 kcal · mol−1 ◦ (Tris). 23 We used the global_connectors.num_protons model—which fits KA , ∆Hindep , ◦ = and nH —to nine experiments. This global fit yielded nH = −1.09 ± 0.02 and ∆Hindep

0.66 ± 0.13 kcal · mol−1 (Figure 2A). These values are in close agreement with the previously reported binding enthalpy and proton release for the Ca2+ : EDT A interaction at pH 7.5: ∆H = −0.56 kcal · mol−1 and nH = 1.17 protons. 24 We also performed a manual fit, plotting the measured binding enthalpies for individual experiments against the ionization enthalpy of the buffers used. 3 By fitting a line to these data, we were able to extract the number of protons released nH = −1.10 ± 0.01 and the buffer-independent binding enthalpy ◦ of ∆Hindep = 0.73 ± 0.10 kcal · mol−1 (Figure 2B). These results were indistinguishable from

the global fit within the bootstrap uncertainty.

Fitting multi-site models The previous analyses used least squares regression to find maximum likelihood estimates of the model parameters. Bayesian MCMC is an alternative approach that integrates over parameter uncertainty. Bayesian and maximum likelihood statistical approaches are complementary, 13,14 relying on different assumptions and providing different information. To demonstrate pytc’s capabilities for Bayesian global analyses, we used data from a previous publication: 25 the binding of Ca2+ to human S100A5. This presents a particularly challenging fitting problem. Ca2+ binds to the protein with 2:1 stoichiometry. One site is high affinity, the other low affinity. Each has its own binding enthalpy. This leads to a 12

ACS Paragon Plus Environment

Page 12 of 21

Page 13 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Figure 2: A) Global fit of a proton binding model to nine individual titration experiments done in three different buffers (indicated on the graph). Individual colors denote individual experiments. We allowed the fraction competent to vary for each experiment, which ranged from 0.95 to 1.05. The molar ratio shown is corrected for fraction competent. B) Fit of a proton-binding model to enthalpies extracted from individual experiments versus buffer ionization enthalpy. This yields the same result as the global fit.

13

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

complex binding isotherm that is very difficult to fit in Origin. Further, it is difficult to resolve the entire curve in a single ITC titration, making extraction of parameters for the higher-affinity site particularly difficult. To circumvent this issue, we measured the binding of Ca2+ to S100A5 using four different titrant/titrate ratios (8×, 10×, 15×, and 18×) while keeping the protein concentration fixed at 50 µM across experiments (Fig 3). We performed these experiments at 25 ◦ C on our ITC-200 (GE Healthcare). All buffers were treated with CHELEX (≈ 1 g · L−1 , stirring for 30 minutes) and filtered at 0.22µm. Prior to being loaded into the instrument, samples were centrifuged at 18, 000 × g for 35 minutes at the experimental temperature. Injection volume was 0.5 µL for the first injection, followed by 2.5 µL for all following injections. Reference power was 3 µcal · s−1 , with stir speed set at 750 rpm. We then used pytc’s Bayesian MCMC sampler to estimate the thermodynamic parameters of a two-site binding polynomial using all four titration datasets simultaneously. We constrained the dilution heat to be between -3.0–0.0 kcal/mol and the dilution intercept between 0—10,000 kcal/mol/injection for all curves. We used uniform prior probabilities for all other parameters. We used the maximum likelihood estimate as a starting point, explored the likelihood surface with 100 walkers taking 20,000 steps each, and discarded the first 10% of steps as burn-in. This strategy allowed us to resolve the entire curve simultaneously by effectively stitching together the curves from the four titrant concentrations. In this way, we were able to measure binding affinities for both the high-affinity and low-affinity site: Kd,high (µM ): 0.1 ≤ 0.5 ≤ 2.7: and Kd,low (µM ): 1.9 ≤ 6.3 ≤ 35. (These triplicate values represent the lower bound for the 95% credibility interval, the mean of the parameter posterior distribution, and the upper bound for the 95% credibility interval.) These values are consistent with previously measured binding at these sites: 1.3 µM and 3.5 µM . 26

14

ACS Paragon Plus Environment

Page 14 of 21

Page 15 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Figure 3: Curves show titration of Ca2+ onto human S100A5 at four different titrant/titrate ratios: 8× (blue), 10× (purple), 15× (red), and 18× (green). We reported these experiments previously. 25 We extracted thermodynamic parameters for a two-site binding polynomial using the Bayesian MCMC sampler in pytc. Points are integrated heats with uncertainty calculated using NITPIC. 9 Lines are 100 curves drawn from the Bayesian posterior distribution. Simultaneous analysis of the four titrant concentrations allowed us to resolve the entire binding curve.

Model selection: global van ’t Hoff analysis One important question is whether one is justified adding more parameters to a model. In a binding experiment, we can frame this as asking how much thermodynamic information we can extract given our data. We will demonstrate this using a van ’t Hoff analysis. 5 The simple van ’t Hoff model assumes that the enthalpy of binding does not vary with temperature, and that the temperature-dependence of the binding constant K is given by:  −∆H ◦  1 1  VH − K(T ) = K(Tref )exp R T Tr ef

(2)

where ∆HV◦ H is the van ’t Hoff enthalpy, Tref is the reference temperature, and R is the gas constant. In some instances, the enthalpy may change as a function of temperature. This appears as a heat capacity change on binding, ∆Cp , leading to the following relationship:  −∆H ◦  1  1  ∆Cp  ref K(T ) = K(Tref )exp − + ln(T /Tref ) + T /Tref − 1 R T Tref R

15

ACS Paragon Plus Environment

(3)

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

◦ where ∆Href is the binding enthalpy at the reference temperature. The second model has an

extra parameter (∆Cp ) relative to the first model. This raises the question: can we extract ∆Cp , or would this be overfitting our data? pytc allows the user to ask this question. To demonstrate this capability, we performed a van ’t Hoff analysis for for the binding of Zn2+ ions to the human protein S100A14. Previously, we discovered that the protein human S100A14 is capable of binding Zn2+ ions with relatively low affinity (KD = 50 µM at 25 ◦ C) and low heats. 27 For this experiment, we titrated freshly prepared 2 mM ZnCl2 onto 110 µM hS100A14 at 278, 283, 290, 298, 303, and 308 K using our ITC-200. Experiments were done in 25 mM T ris, 100 mM N aCl buffer at pH 7.4 using the ITC protocol previously described. 27 We manually integrated the power curves using Origin.

Figure 4: Binding of Zn2+ ions to human S100A14 was measured using ITC. Data were collected at 278 (green), 283 (orange), 290 (red), 298 (magenta), 303 (purple), and 308 K (blue). A) We extracted thermodynamic parameters for a single-site van ’t Hoff binding model using the Bayesian MCMC sampler in pytc. Points are integrated heats extracted using Origin. Lines are 100 curves drawn from the Bayesian posterior distribution. B) Corner plot showing correlations between the posterior distributions for the two global parameters ∆HV◦ H and Kglobal . Distribution was generated by 100 walkers, each taking 5,000 steps. We globally fit each version of the van ’t Hoff model to all six datasets simultaneously using the Bayesian MCMC sampler in pytc. We set the uncertainty of each integrated heat to be ±0.1 kcal · mol−1 . We linked the ∆HV◦ H , K, and ∆Cp parameters in a global fit to all experiments simultaneously. Other parameters, such as dilution heats, were allowed to vary for each dataset separately. We used the maximum likelihood estimate as a starting guess and then explored the likelihood surface with 100 MCMC walkers, each taking 5,000 16

ACS Paragon Plus Environment

Page 16 of 21

Page 17 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

steps. We discarded the first 10% of steps as burn-in. These fits yielded global values for the ∆HV◦ H and ∆Cp parameters. The R2 for the more complex model was slightly higher than for the less complex model (R2 = 99.8 vs. 99.7). We then used an Akaike Information Criterion (AIC) test—as implemented in the pytc.util.compare_models method—to assess the support for adding the extra fit parameter. 14,18 The result of this test is the probability that a given model minimizes information loss relative to the other models. A high probability indicates higher support for that model relative to the others. We obtain P > 0.99999 for the fixed enthalpy model and < 0.00001 for the heat capacity model, indicating that we are not justified in adding the additional ∆Cp fit parameter. The global Bayesian fit with the standard van ’t Hoff model yielded an estimate of ∆HV◦ H with 95% credibility interval of −6.6 ≤ −6.2 ≤ −5.9 kcal · mol−1 . Likewise, the fit returns a global estimate of the binding constant with a 95% credibility interval of 1.4 × 104 ≤ 1.8 × 104 ≤ 2.2 × 104 M −1 , which agrees with the maximum likelihood estimate for the KD of 56 µM . The joint posterior distribution for these parameters is shown as a corner plot (easily output using pytc.corner_plot()) in Fig 4B. This van ’t Hoff analysis shows the power of the Bayesian MCMC analysis and also illustrates the utility of the pytc API for selecting between thermodynamic models.

Conclusion We anticipate that pytc will prove useful in a wide variety of ITC analyses. Our implementation of both maximum likelihood and Bayesian Markov-Chain Monte Carlo sampling, flexible global modeling, and inclusion of functions for model selection make pytc a powerful and versatile platform for analysis of ITC data. Further, we strongly support modification and extension of pytc by the scientific community. Because the code is released into the public domain and has an exposed, well-documented API, we look forward to the many im-

17

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

provements and extensions of the code that will naturally occur as users define new models and contribute to the code base.

Funding This work was supported by NIH R01GM117140 (MJH and HD) and NIH 7T32GM007759-37 (LCW).

Notes The authors declare no competing financial interests.

Acknowledgement The authors thank the members of the Harms lab for feedback and software testing.

References (1) Wiseman, T., Williston, S., Brandts, J. F., and Lin, L.-N. (1989) Analytical Biochemistry 179, 131–137. (2) Freyer, M. W., and Lewis, E. A. In Methods in Cell Biology; Biology, B. M. i. C., Ed.; Biophysical Tools for Biologists, Volume One: In Vitro Techniques; Academic Press, 2008; Vol. 84; pp 79–113. (3) Baker, B. M., and Murphy, K. P. (1996) Biophysical Journal 71, 2049–2055. (4) Horn, J. R., Russell, D., Lewis, E. A., and Murphy, K. P. (2001) Biochemistry 40, 1774–1778.

18

ACS Paragon Plus Environment

Page 18 of 21

Page 19 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

(5) Perozzo, R., Folkers, G., and Scapozza, L. (2004) Journal of Receptors and Signal Transduction 24, 1–52. (6) Houtman, J. C., Brown, P. H., Bowden, B., Yamaguchi, H., Appella, E., Samelson, L. E., and Schuck, P. (2007) Protein Science 16, 30–42. (7) Brautigam, C. A., Zhao, H., Vargas, C., Keller, S., and Schuck, P. (2016) Nature Protocols 11, 882–894. (8) Zhao, H., Piszczek, G., and Schuck, P. (2015) Methods 76, 137–148. (9) Keller, S., Vargas, C., Zhao, H., Piszczek, G., Brautigam, C. A., and Schuck, P. (2012) Anal. Chem. 84, 5066–5073. (10) Freire, E., Schön, A., and Velazquez-Campoy, A. Methods in Enzymology; Biothermodynamics, Part A; Academic Press, 2009; Vol. 455; pp 127–155. (11) Sigurskjold, B. W. (2000) Analytical Biochemistry 277, 260–266. (12) Jaynes, E. T., and Kempthorne, O. Foundations of Probability Theory, Statistical Inference, and Statistical Theories of Science; The University of Western Ontario Series in Philosophy of Science; Springer, Dordrecht, 1976; pp 175–257. (13) Bayarri, M. J., and Berger, J. O. (2004) Statistical Science 19, 58–80. (14) Wood, S. N. 2015. (15) Jones, E., Oliphant, T., and Peterson, P. 2011-. (16) Byrd, R. H., Schnabel, R. B., and Shultz, G. A. (1988) Mathematical Programming 40, 247–263. (17) Foreman-Mackey, D., Hogg, D. W., Lang, D., and Goodman, J. (2013) Publications of the Astronomical Society of the Pacific 125, 306–312.

19

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(18) Akaike, H. In Selected Papers of Hirotugu Akaike; Parzen, E., Tanabe, K., and Kitagawa, G., Eds.; Springer Series in Statistics; Springer New York, 1998; pp 199–213. (19) Millman, K. J., and Aivazis, M. (2011) Computing in Science Engineering 13, 9–12. (20) van der Walt, S., Colbert, S. C., and Varoquax, G. (2011) Computing in Science & Engineering 13, 22–30. (21) Hunter, J. (2007) Computing in Science & Engineering 9, 90–95. (22) Freiburger, L. A., Auclair, K., and Mittermaier, A. K. (2012) Thermochimica Acta 527, 148–157. (23) Goldberg, R. N., Kishor, N., and Lennen, R. M. (2002) Journal of Physical and Chemical Reference Data 31, 231–370. (24) Griko, Y. V. (1999) Biophysical Chemistry 79, 117–127. (25) Wheeler, L. C., and Harms, M. J. (2017) BMC Biophysics 10, 8. (26) Kim, I., Lee, K. O., Yun, Y.-J., Jeong, J. Y., Kim, E.-H., Cheong, H., Ryu, K.-S., Kim, N.-K., and Suh, J.-Y. (2017) Biochem. Biophys. Res. Commun. 483, 332–338. (27) Wheeler, L. C., Donor, M. T., Prell, J. S., and Harms, M. J. (2016) PLOS ONE 11, e0164740.

20

ACS Paragon Plus Environment

Page 20 of 21

Page 21 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Graphical TOC Entry

21

ACS Paragon Plus Environment