A Dynamic, Mathematical Model for Quantitative Glycoprofiling Using

Jun 27, 2016 - Mo Jiang , Kristen A. Severson , John Christopher Love , Helena Madden , Patrick Swann , Li Zang , Richard D. Braatz. Biotechnology and...
1 downloads 0 Views 2MB Size
Article pubs.acs.org/acssensors

A Dynamic, Mathematical Model for Quantitative Glycoprofiling Using Label-Free Lectin Microarrays Daniel P. Salem, Justin T. Nelson, Sojin Kim, and Michael S. Strano* Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States S Supporting Information *

ABSTRACT: The emergence of label-free lectin microarrays promises rapid and efficient glycoprofiling of complex analyte mixtures. Lectins have limited selectivity for different carbohydrate motifs necessitating relatively large array sizes to discriminate between glycoforms. Microarray technologies able to transduce the dynamics, instead of only the extent of binding, can introduce additional orthogonality in the array and therefore reduce its size. In this work, we develop a mathematical model of glycan binding dynamics to a label-free lectin sensor array, linking the matrix of observed dissociation constants, kinetics of binding, and occupancy to distinct glycoforms for identification. We introduce a matrix algebra approach that formulates the observed array dynamics in terms of a glycosylation matrix containing identifiers for each glycan chain on each protein isoform in the mixture. This formulation allows for straightforward calculation of the minimum array size necessary to distinguish a given set of glycans. As examples, we evaluate the binding of human IgG to two lectins, peanut agglutinin (PNA) and Erythrina cristagalli lectin (ECL), attached to near-infrared fluorescent single-walled carbon nanotube sensor array elements, both of which have affinities for terminal galactose residues. We demonstrate the application of both the steady state and transient model solutions to the glycan-lectin binding data, and we validate that linking microarray dynamics to glycan structure promises to significantly reduce requisite array size and complexity for rapid and efficient glycoprofiling. KEYWORDS: lectin, microarray, glycan, glycoprofiling, kinetic, model, label-free

T

either loading curves or stochastic dynamics, 15 versus equilibrium binding isotherms, can provide additional information with a smaller set of selected lectins to decrease array size. However, to date, the literature lacks a quantitative mathematical model to link the complex structures of a glycan mixture with an array of lectins to quantify glycosylation. The goal of this work is to introduce a matrix algebra approach for glycoprofiling that allows one to link glycan structure and microarray response, resulting in straightforward predictions of requisite array size, lectin requirements, and glycan composition. Traditional laboratory methods for determining glycan structures include liquid or gas chromatography (LC/GC), mass spectrometry (MS), and nuclear magnetic resonance spectroscopy (NMR).16,17 During a typical analysis, glycans are enzymatically cleaved from the glycoprotein (or glycoproteins are enzymatically digested into glycopeptides), separated, and characterized using MS.18,19 The original glycan structures are then reconstructed using software and databases of previously characterized structures.19 The purpose of this current work is

he carbohydrate structures of glycoproteins play vital roles in both structure and function by impacting protein folding,1−3 stability,4,5 and activity.6−8 Glycoproteins are central to a variety of biological processes including immunogenicity6,9 and cancer metastasis.10,11 Hence, technologies are needed to aid our understanding of the synthesis, structure, and function of glycans within biological systems. Many have noted that knowledge generation in glycobiology has been slow relative to genomics and proteomics in the mid to late 20th century12 because of technological difficulties associated with assessing glycan structure and function. Accordingly, better tools are needed to more easily and rapidly distinguish closely related glycan structures and characterize the glycosylation patterns of glycoprotein mixturesa procedure termed “glycan profiling” or “glycoprofiling.” The use of lectin microarrays (LMAs) is a promising high-throughput strategy for rapid and efficient glycoprofiling. Unlike antibodies or oligonucleotide hybridization, glycan−lectin binding is characterized as less affined and is strengthened by multivalent interactions (the cluster glycoside effect).13,14 Lectins also demonstrate extensive cross reactivity among glycan carbohydrates, necessitating large array sizes. Lectin microarrays have largely employed qualitative comparisons, using patterns of binding to distinguish between specific glycans. The use of dynamic information in the form of © XXXX American Chemical Society

Received: February 23, 2016 Accepted: June 27, 2016

A

DOI: 10.1021/acssensors.6b00121 ACS Sens. XXXX, XXX, XXX−XXX

Article

ACS Sensors

Figure 1. Example glycoprotein mixture containing five unique glycoforms and five unique glycans along with the proposed matrix algebra formulation. Entries in P indicate the number of a given glycan chain type attached to a glycoprotein isoform. Each glycan type and glycoform corresponds to a separate column and a separate row, respectively. The monosaccharide symbolic scheme is consistent with that used by the Consortium for Functional Glycomics (CFG).12



LINEAR ALGEBRA FORMULATION OF GLYCOPROFILING The analytical challenge of glycoprof iling can be described as the ability to take a mixture of glycoproteins and elucidate the glycosylation pattern and concentration of each glycoform in the mixture. A glycoform consists of a unique combination of amino acid sequence and glycosylation pattern, while the term glycoprotein will be used to describe a collection of glycoforms containing the same amino acid sequence. Consider a mixture containing k glycoproteins, i glycoforms, and j unique glycans, where i ≥ k. The concentration of all glycoforms can be assembled into a single i-component vector, CP. Furthermore, we can create an (i × j) matrix, P, whose entries Pij are integer values corresponding to the number of glycan j attached to glycoform i. Each row of P corresponds to a particular glycoform in the mixture, and each column of P corresponds to a unique glycan. We have successfully glycoprofiled the mixture if we have complete knowledge of both CP and P.

to investigate how label-free sensor arrays can enable highthroughput and reliable characterization of the glycosylation patterns of an intact glycoprotein sample. Such approaches may be important for the development of new diagnostics as well as for implementing real-time feedback and quality control during biopharmaceutical production where aberrant glycosylation patterns significantly affect drug efficacy.20,21 Such methods may also find utility in the identification of counterfeit drugs and/or biosimilars whose glycosylation patterns (or lack thereof) may render the drug ineffective or harmful.22 Since their introduction in 2005, LMAs have become a promising, high-throughput method to study protein glycosylation.23−25 LMAs are analogous to DNA and protein microarrays developed previously in which lectins and other glycan binding proteins (e.g., carbohydrate antibodies) are covalently attached to a solid substrate. In the conventional approach, a mixture of fluorescently labeled glycoproteins is added to the array, resulting in a unique fluorescence pattern that allows for a qualitative determination of the glycans present.23,25,26 As a result of the fluorescent labeling, most traditional LMAs require a washing step before measuring fluorescence intensity and do not yield equilibrium or timedependent binding data, but select for only the lowest dissociation constant (KD) binding. This fundamentally prevents the LMA from being able to quantitatively glycoprofile a glycoprotein sample. Kuno et al. avoided this limitation by using an evanescent field to specifically excite fluorophores attached to bound glycans, allowing for real-time detection of glycan-lectin binding.25 Alternative, label-free LMA platforms have also been proposed utilizing the particle plasmon light scattering of gold nanoparticles27 and the near-infrared fluorescence of single-walled carbon nanotubes (SWNTs).28 In this work, we develop a linear algebra-based mathematical formulation of the glycoprofiling problem and a binding kinetic model that can be used to describe LMA data. The results of this model provide insight into optimal LMA design (i.e., number and type of lectins) and can be used to obtain kinetic constants if glycoprotein concentrations are known. Most importantly, this model allows a user to obtain quantitative information from steady state or dynamic LMA data that can be used to glycoprofile a glycoprotein mixture.

P1,1 P1,2 ⋯ P1, j

C1 CP =

C2 ⋮ Ci

P=

P2,1 P2,2 ⋯ P2, j ⋮



⋱ ⋮

Pi ,1 Pi ,2 ⋯ Pi , j

Note that P is a sparse matrix: the sum of a given row of P must be less than or equal to the total number of possible glycosylation sites of the particular glycoprotein. Also note that the glycoform numbering scheme is irrelevant as long as the rows of P correspond to the entries in CP. A related jcomponent vector, G, contains the concentration of each unique glycan in the mixture and can be easily related to P and CP via eq 1.

G = PTCP

(1)

Figure 1 provides an example mixture of five unique glycoforms in which the glycoforms and glycans are arbitrarily labeled. Following this labeling scheme, we can assemble the matrix P. B

DOI: 10.1021/acssensors.6b00121 ACS Sens. XXXX, XXX, XXX−XXX

Article

ACS Sensors



Steady State Analysis. We first direct our attention to two questions: (1) How large does the lectin array have to be to uniquely identify G (analyzing P and CP will be considered in a later section)? (2) How can the matrix of responses, θ(t), be used to extract and compare fundamental dissociation constants, K DSj , for a system at equilibrium? These questions can be answered by examining the steady state result of eqs 4−7. At steady state,

KINETIC MODEL OF A LECTIN MICROARRAY Monovalent glycan−lectin binding events are typically weak, with dissociation constants in the millimolar to micromolar range.29,30 While the binding of glycans to lectins is enhanced by multiple simultaneous binding events, a 1:1 kinetic model can be assumed and later modified to account for multivalent interactions. Such a model has been used previously to describe experimental binding dataOlkhov and co-workers measured glycan-lectin binding events using a label-free glycan microarray and found a 1:1 binding model to fit the data for low to moderate lectin concentrations.27 Furthermore, our group used a 1:1 kinetic model to fit glycan-lectin binding data obtained using SWNT nanosensors.28 The results of such an analysis are effective kinetic parameters most likely representing multiple binding interactions. The simple glycan-lectin binding event can be described by the following reaction,

⎛ ⎜L − ⎜ S, T ⎝

kSj

k −Sj

where LS is the S th lectin, Gj is the jth glycan, θSj is the bound glycan−lectin complex, and kSj and k −Sj are the forward and reverse rate constants, respectively. As written, this reaction assumes that protein identity does not affect binding. This is a fundamental assumption of all LMA analyses and is necessary to obtain a readily applicable model. Moreover, this assumption allows for the use of kinetic parameters determined via glycan arrays, frontal affinity chromatography (FAC), and other methods used to probe glycan−lectin interactions. Given the reaction above, we can write the rate of change of each glycan− lectin complex concentration. dt

= kSj(LS)(Gj) − kSjK DSjθSj

LS, T (3)

= kSj(LS, T −

= −∑

Sj

dt

subject to typical initial conditions.

Gj(t = 0) = Gj0 =

∑ CP,i0Pij i

jmax

⎞⎛ G ⎞ j ⎟ ⎟ K ⎠⎝ DSj ⎠

∑ θSj⎟⎟⎜⎜ j=1

j

=

Gj

∑ jmax =1 K 1+

DSj

Gj j ∑ jmax =1 KD Sj

for each lectin, S (9)

(10)

In the event that the amount of glycans in solution is much greater than the amount of bound lectins, the values of Gj determined from the steady state analysis correspond approximately to the original glycan concentrations. However, if the amount of glycans and lectins are of similar magnitude, depletion of glycans in solution must be accounted for by S θ in eq 9, where α is a conversion setting Gj = Gj0 − α ∑Smax = 1 Sj factor between surface and volumetric concentration. The equations above provide an important minimal criterion for LMA design: assuming the LMA signal can be related to the total surface concentration/fraction of occupied lectin sites and the dissociation constants are known, we need at least j unique lectins to solve for the concentrations of all glycans in the mixture. Lectins X and Y are defined as unique if the corresponding

(5)

θSj(t = 0) = 0 for all S

G2 ⎞ ⎟⎟ ⎠⎝ K DS2 ⎠

θS G = for each lectin, S LS, T K DS + G

(4)

dθSj(t )

S

j=1

⎞⎛

Multiplying both sides of the above equation by the total lectin concentration solves for the total concentration of occupied lectin sites, which may be more easily related to the LMA output signal. In the case of a single glycan system (i.e., jmax = 1), eq 9 reduces to eq 10.

∑ θSj(t ))Gj(t ) − kSjK D θSj(t ) j

jmax



j

and free glycan concentrations, dt

∑ θSj⎟⎟⎜⎜

jmax

In practice, we will relate the LMA signal to the total concentration of glycans bound to each lectin. Thus, it is advantageous to add up all j equations and solve explicitly for the fraction of occupied lectin binding sites, as shown in eq 9.

After adding in a lectin balance, we obtain the final system of ordinary differential equations to describe the glycan−lectin complex concentrations,

dGj(t )

⎛ θS2= ⎜⎜LS, T − ⎝

∑ jmax θ = 1 Sj

i

dt

⎞⎛ G ⎞ θ ∑ Sj⎟⎟⎜⎜ 1 ⎟⎟ ⎠⎝ K DS1 ⎠ j=1

⎛ θSj= ⎜⎜LS, T − ⎝

(2)

∑ CP,iPij

dθSj(t )

⎛ θS1= ⎜⎜LS, T − ⎝



Note we can relate Gj to P and CP via eq 3.

Gj =

(8)

The above equation can be written for each glycan-lectin complex, resulting in (S × j) algebraic equations for the entire LMA. For a given lectin we have the following set of j equations:

LS + Gj XoooY θSj

dθSj

⎞ ∑ θSj⎟⎟(Gj) = K DSjθSj ⎠ j=1 jmax

(6)

(7)

Equations 4−7 constitute the most general form of a LMA operating in time after introduction of a glycoprotein mixture at t = 0, generating a matrix of sensor responses, θ(t), corresponding to all θSj values. C

DOI: 10.1021/acssensors.6b00121 ACS Sens. XXXX, XXX, XXX−XXX

Article

ACS Sensors vectors that define their K DSj values are not equal (within the intrinsic error of the measurement platform). Equation 9 implies three regimes, two of which are experimentally relevant. Regime 1: Gj > >K DSj for all j. In this case, eq 9 reduces to the limit of lectin saturation where all possible lectin sites are occupied. This limit is not experimentally relevant as it provides no selectivity. Regime 2: Gj < >∑S LS, T for all j, we can rewrite eq 4 assuming Gj is constant. dθSj(t ) dt

Gj K DSj

j

j

∑ jmax θ = 1 2j L 2, T

=

1 1 1 ⋯ K D2,1 K D2,2 K D2,j





∑ jmax θ = 1 Sj

1 K DS,1

j

LS, T







1 1 ⋯ K DS,2 K DS,j

Sj

(13)

j

d θS = LS, T b − A·θS dt

1 1 1 ⋯ K D1,1 K D1,2 K D1,j

L1, T

∑ θSj(t ))(Gj) − kSjK D θSj(t )

A system of j coupled ODEs can be written for each lectin. In order to obtain an analytical solution, each system must be uncoupled using a method introduced by Wei and Prater in 1962.31 Each system of j coupled ODEs can be consolidated into matrix form given by eq 14,

(11)

At this point, a single matrix equation can be written for the entire LMA. ∑ jmax θ = 1 1j

= kSj(LS, T −

G1 ·

(14)

where

G2

θS1



θS =

Gj

θS2 ⋮ θSj

(12)

kS1G1

It is evident from eq 12 that, within this regime, the KD vectors for all lectins must be linearly independent for there to be a unique solution. Note that for DNA and protein microarrays, the matrix above will be approximately diagonal. Regime 3: Gj ∼ K DSj for all j. If the glycan concentrations are of the same order of magnitude as the dissociation constants, eq 9 remains unchanged. Within this regime, it is not necessary that the lectin KD vectors be linearly independent. In response to the second question posed at the beginning of this section, eqs 9 and 11 can be used to calculate dissociation constants if glycan concentrations are known. The simplest method to solve for the dissociation constants using equilibrium LMA data would be to test a single glycan on the LMA. This procedure can be completed using glycan or glycoprotein standards. Dissociation constants can be calculated from a mixture of glycans provided that a larger set of LMA experiments is performed at different glycan concentrations. Specifically, the number of different sets of glycan concentrations would have to be greater than or equal to the number of glycans in the mixture. If operating in regime 2, the glycan concentrations must also be linearly independent. Dynamic Analysis. Most microarray and sandwich assay methods involve an attempt to attain equilibrium followed by a washing step to remove weakly bound and nonspecifically bound species in order to increase the signal-to-noise ratio. In such cases, eq 9 can be applied to describe the system at approximate equilibrium. However, additional information is available if one is able to measure the array response dynamically (i.e., measure θ(t)). Weak binding lectins, with large K DSj values, may not reach equilibrium within the array incubation time and are necessarily washed away before conventional steady state analysis. This washing step is especially problematic for lectin microarrays because they rely

b=

kS2G2 ⋮ kSjGj kS1G1 + kS1K DS1

A=

kS2G2

kS1G1



kS1G1

kS2G2 + kS2K DS2 ⋯

kS2G2





kSjGj

kSjGj





⋯ kSjGj + kSjK DSj

Note that A is a (j × j) square matrix and that eq 14 can be written for each lectin in the LMA. We now introduce a fictitious vector, ξ, such that the following equation is satisfied. θS = X·ξ

(15)

In the equation above, X is a matrix containing the eigenvectors of A. We can now rewrite the condensed matrix form where X can be taken outside the derivative. X

d ξ = LS, T b − A·(X·ξ) dt

(16)

After multiplying both sides by the inverse of X, we can take advantage of the definition, X−1AX = λ, where λ is a diagonal matrix containing the eigenvalues of A. d ξ = LS, T X−1·b − λ·ξ dt

(17)

Given that λ is a diagonal matrix, the system of ODEs is now uncoupled and each component of the vector ξ can be represented by the following analytical solution, D

DOI: 10.1021/acssensors.6b00121 ACS Sens. XXXX, XXX, XXX−XXX

Article

ACS Sensors ξi =

LS, T di λi

⎛ LS, T di ⎞ + ⎜ξi0 − ⎟exp( −λit ) λi ⎠ ⎝

have as many as six to eight CRDs.13,32 The enhancement in binding strength that accompanies multivalent interactions (referred to as the glycoside cluster effect13) is a fundamental characteristic of glycan-lectin binding events that must be captured by the kinetic model describing LMA data. An important characteristic of multivalent binding events is that the avidity of the overall interaction is dependent on the relative concentrations of the two binding partners. We recently reported this phenomenon for protein A binding to human immunoglobulin G (IgG), where the multivalent nature of protein A and IgG resulted in concentration dependent binding kinetics.33 In the context of a lectin microarray, a larger number of multivalent interactions takes place at low glycan concentrations, resulting in stronger avidity and a smaller ef fective dissociation constant when fit to a 1:1 kinetic model. However, at high glycan concentrations, the number of bound glycans will increase and the glycoproteins will crowd each other, resulting in a reduction in individual binding events per glycan. This decrease in multivalent interactions reduces the strength of binding and leads to a larger effective dissociation constant when fit to a monovalent model. As we demonstrated for protein A/IgG binding, elucidating the concentration dependence of the effective dissociation constant enables the use of a 1:1 kinetic model for describing multivalent interactions. This can be extended to the LMA model where each dissociation constant is considered a function of the glycan concentrations, KD,eff = f(G). While each KD is also dependent on the identities (i.e., KDs) of the other glycans in the system, this dependence will not be accounted for as it would significantly limit the utility of experimentally determined KD,eff functions. Incorporating concentration dependent dissociation constants does not affect the steady state and dynamic solutions derived previously due to the original assumption of nearly constant glycan concentrations. Therefore, we can rewrite eq 9 to account for this concentration dependence.

(18)

−1

where ξ(0) = ξ0, d = X ·b, and λi is the ith diagonal entry in λ. Assuming an initial condition of θS(t = 0) = 0, we can set ξ0 = 0. Once all of the components of the vector ξ have been determined, we can simply transform back to “real” concentrations via eq 15. At this point, we have obtained the vector of sensor array responses θS for a given lectin. We can repeat the above analysis for every unique lectin in the microarray. Each θS vector will contain j unknowns, corresponding to the concentration of each glycan (assuming the kinetic parameters are already known). Provided that the number of time points in the dynamic LMA binding data exceeds the number of glycans in solution, it is possible to fit all j glycan concentrations to a single lectin binding curve. While doing so may yield concentration estimations with large confidence intervals, this result indicates that the collection and analysis of dynamic binding data adds additional orthogonality into the microarray, enabling the use of a microarray containing fewer lectins that glycan structures. As was the case for the steady state analysis, we can explicitly solve for the fraction of occupied lectin sites by normalizing by the total lectin concentration. Assuming ξ0 = 0, 1 θS = X·ξ′ LS, T

ξi′ =

(19)

di (1 − exp(−λit )) λi

(20)

By summing the entries of eqs 15 or 19, we can solve for the total concentration (or total fraction) of occupied sites for lectin S . This procedure can be repeated for all lectins in the microarray. The transient analysis can be used to determine the dissociation constant and the forward and reverse reaction rate constants (kSj and k −Sj ). Similarly to the steady state analysis, the simplest way to measure these kinetic parameters would be to challenge the LMA with a single glycan at a known concentration. If a single glycan is used, the Wei−Prater decoupling procedure is no longer needed and the analysis simplifies to the general result obtained for a monovalent binding reaction. θ=

LS, T G G + K DS

j

∑ jmax =1

j

∑ jmax θ = 1 Sj LS, T

= 1+

Gj fK (G) DSj

Gj j ∑ jmax = 1 f (G) K DSj

for each lectin, S (22)

Note that a similar function can be included to describe the concentration dependence of the forward rate constant when using the dynamic solution. This function, kSj,eff = h(G), would be obtained from the same calibration data set used to determine KD,eff = f(G). In order to disregard the dependence of each KD on the other glycans in the system, the vector of glycan concentrations, G, must be simplified to a single concentration value, Cg, before being input into each KD,eff function. This can be completed in three different ways: (i) setting Cg equal to the sum of all glycan concentrations; (ii) setting Cg equal to a single glycan concentration (the glycan for which the KD,eff function is defined); and (iii) setting Cg equal to the sum of glycan concentrations that exhibit affinity (above a predefined threshold) for the given lectin. The first and second methods would generally lead to an overestimation and underestimation of KD,eff values, respectively.

(1 − e−(kSj(G) + kSj(K DS))t ) (21)

The additional orthogonality provided by the collection of dynamic binding data does enable the simultaneous fitting of kinetic parameters for a mixture of glycans. Conducting multiple binding experiments at different glycan mixture concentrations (which need not be linearly independent) will improve the confidence interval of the fitted parameters and enable characterization of a larger glycan mixture. However, this requires that the concentrations of glycans in the mixture are precisely known. Accounting for Multivalency. The currently formulated kinetic model assumes monovalent glycan−lectin binding interactions. In reality, many lectins have multiple carbohydrate recognition domains (CRDs) capable of binding to flexible, multiantennary glycan structures. C-type lectins, a large group which requires a calcium ion to maintain binding ability, can



MODEL CONSIDERATIONS AND LIMITATIONS In the preceding sections, we derived equilibrium and nonequilibrium expressions for both the total concentration E

DOI: 10.1021/acssensors.6b00121 ACS Sens. XXXX, XXX, XXX−XXX

Article

ACS Sensors

Figure 2. Fluorescence response of (A) PNA-loaded SWNT sensors and (B) ECL-loaded SWNT sensors to human IgG with a kinetic model fit assuming one dominant glycan structure (red) and two dominant glycan structures (green) and an accompanying plot of the fit residuals.

enough unique lectins. In the regime in which glycan concentrations are much smaller than the dissociation constants (i.e., Gj <