Network-Based Classification and Modeling of Amyloid Fibrils | The

May 16, 2019 - Topology of all amyloid structures can be described using a simple network framework. ... whereby the minimal repeating subunit is a pa...
1 downloads 0 Views 2MB Size
Article Cite This: J. Phys. Chem. B 2019, 123, 5452−5462

pubs.acs.org/JPCB

Network-Based Classification and Modeling of Amyloid Fibrils Gianmarc Grazioli,†,‡ Yue Yu,§,† Megha H. Unhelkar,‡ Rachel W. Martin,‡,∥ and Carter T. Butts*,†,§,⊥ †

California Institute for Telecommunications and Information Technology, ‡Department of Chemistry, §Department of Computer Science, ∥Department of Molecular Biology and Biochemistry, and ⊥Departments of Sociology, Statistics, and Electrical Engineering and Computer Science, University of California, Irvine, Irvine 92697, United States

Downloaded via KEAN UNIV on July 22, 2019 at 04:40:57 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.

S Supporting Information *

ABSTRACT: Amyloid fibrils are locally ordered protein aggregates that self-assemble under a variety of physiological and in vitro conditions. Their formation is of fundamental interest as a physical chemistry problem and plays a central role in Alzheimer’s disease, Type II diabetes, and other human diseases. As the number of known amyloid fibril structures has grown, the need has arisen for a nomenclature for describing and classifying fibril types, as well as a theoretical description of the physics that gives rise to the self-assembly of these structures. Here, we introduce a systematic nomenclature and coarse-graining methodology for describing the topology of fibrils and other protein aggregates, along with a computational methodology for simulating protein aggregation. Both have mathematical underpinnings in graph theory and statistical mechanics and are consistent with available experimental data on the fibril structure and aggregation kinetics. Our graph representation of the fibril topology enables us to define a network Hamiltonian based on connectivity patterns among monomers rather than detailed intermolecular interactions, greatly speeding up the simulation of large ensembles. Our simulation strategy is capable of recapitulating the formation of all currently known amyloid fibril topologies found in the Protein Data Bank, as well as the formation kinetics of fibrils and oligomers.



INTRODUCTION Background. More than 40 different human diseases, many of which are both fatal and incurable, are associated with the formation of amyloid fibrils.1,2 These locally ordered protein aggregates are characterized by a cross-β structure, in which β-strands composed of adjacent monomers stack in a repeating linear pattern, analogous to a crystal that grows along a single dimension rather than in all three.3 This characteristic structure is conserved across amyloid fibrils, independent of the secondary structure of the protein monomer prior to fibrillization.4 Despite the universality of this overall arrangement, the patterns of connectivity among monomers that form the repeating subunits of amyloid fibrils vary widely and are incompletely characterized. Indeed, the same fibrillizing monomer has been found to form different periodic structures in different experiments, ruling out simple sequence-based explanations: for example, the PDB structures 5KK35 and 2BEG6 exhibit markedly different periodic patterns along their respective growth axes, yet both fibrils comprised the same Aβ1−42 peptide. High-resolution structures have shown a diverse set of fibril subunit geometries displaying subtle but distinctive differences, for example, linear versus annular structures and parallel versus antiparallel β-sheet arrangements.7−10 These structural differences have demonstrated clinical relevance: for instance, they have been shown to directly correlate with toxicity and disease progression for strains of both β-amyloid11 and α-synuclein12 fibrils. The key © 2019 American Chemical Society

feature differentiating these subunit geometries from each other is the periodic pattern of noncovalent bonding between monomers. We refer to such motifs of noncovalently bonded connectivity among the protein monomers making up each fibril type as fibril topology, the characterization and modeling of which are the focus of the present work. As an initial illustration of how fibril topology can be extracted from high-resolution structures and specified using graph-theoretic formalisms, Figure 1 compares the patterns of noncovalent connectivity for human Iowa mutant β-amyloid fibrils, associated with an early-onset hereditary form of Alzheimer’s disease (PDBID: 2LNQ7), with that of a fibril formed by wild-type Aβ1−42 (PDBID: 5KK35). As detailed below, we represent the topology of each fibril by a network in which each node represents a single protein monomer, with ties indicating monomers that are noncovalently bound to one another. Application of this coarse-grained representation to solved amyloid structures demonstrates that it is sufficient to distinguish a wide range of fibril topologies while also being compact enough to serve as the basis for scalable models of fibrillization kinetics that are able to simulate the fibrillization of hundreds or thousands of protein monomers. Our examination of experimentally solved amyloid fibril structures Received: April 13, 2019 Revised: May 7, 2019 Published: May 16, 2019 5452

DOI: 10.1021/acs.jpcb.9b03494 J. Phys. Chem. B 2019, 123, 5452−5462

Article

The Journal of Physical Chemistry B

of current atomistic methods, we have developed a mathematical formalism for a network Hamiltonian describing the physics of interactions between fibrillizing protein monomers in graph theoretic or connectivity terms rather than in atomistic detail; our approach exploits a formal connection between the statistical mechanics of the fibrillization process and a framework for network modeling (exponential family random graph models or ERGMs) originally developed for studying social networks. We also describe the simulation methodology that generates representative fibril topologies under this Hamiltonian, recapitulating the topologies of all currently known amyloid fibril structures reported in the Protein Data Bank (PDB13). In the remainder of this paper, we (1) introduce a general topological formalism for protein aggregates (fibrillar or otherwise), including a quantitative mapping from atomicresolution structures to their topological representations; (2) introduce a typology and nomenclature for amyloid fibrils that encompasses all fibril structures found in the Protein Data Bank; (3) introduce a specific family of ERGMs that can reproduce all currently known amyloid fibril topologies (as well as some not yet experimentally observed); (4) employ Markov-chain Monte Carlo (MCMC) simulations to examine the equilibrium behavior of our models; and (5) employ a dynamic extension of ERGMs based on local energy differences to probe the kinetics of fibril formation.

Figure 1. Two examples of the mapping of three-dimensional fibril structures into their equivalent graph representations, where the color coding indicates different protein monomers. Each node in panels (B,D) corresponds to a protein monomer, with ties between nodes whose monomers are noncovalently bound. Panels (A,B) show the molecular structure and graph representations, respectively, of a fibril segment formed from β-amyloid D23N (PDBID: 2LNQ7). Using the typology developed in this paper, this fibril structure is classified as a 1-ribbon. Panels (C,D) show the molecular structure and its corresponding graph representation for a segment of wild-type Aβ1−42 (PDBID: 5KK35). In our typology, this structure is classified as a 1,2 2-ribbon.



METHODS Defining Protein Aggregation States with Graphs. To capture the process of fibril formation, we require a simple representation that can accommodate protein monomers, small oligomers, larger unstructured aggregates, and the repeating units of fibrils themselves. Graph theory provides a natural language for this purpose and enables use of the ERGM formalism that has been extensively developed for modeling graphs in social network and neuroscience applications to simulate realizations of the aggregation process.14 A graph is composed of a set of nodes or vertices, together with a set of ties or edges representing pairwise interactions between nodes. Here, the nodes represent individual protein monomers, with two nodes being joined by an edge if their corresponding monomers are noncovalently bonded. We refer to such structures as aggregation graphs. Because our focus is on fibril topology, we do not distinguish among types of intermolecular interactions (e.g., hydrogen bonds, salt bridges, nonpolar interactions, etc.); in practice, protein monomers are held together by a combination of different interactions, whose combined effects are of primary interest. As Figure 1 illustrates, a topological representation is rich enough to distinguish among different fibril forms. Additional forms are shown in the examples throughout this work. In the case of fibril structure A from Figure 1, we note that the structure displays a motif in which each protein monomer is bonded exclusively to its two immediate neighbors along the fibril growth axis; the minimal repeating subunit is a single protein monomer. In contrast to this, the latter structure is characterized by two chains of linear growth, whereby the minimal repeating subunit is a pair of monomers (one from each chain and of equal fibril growth axis index) that are bonded to: (1) each other; (2) both of their neighbors along the same chain; and (3) the monomer on the opposite chain with a fibril growth axis index offset by one and opposite in sign. Although our representation is by intention

reveals several distinct and previously unrecognized topological classes of fibrils; to describe them, we introduce a systematic nomenclature for the fibril structure akin to the naming conventions for organic polymers or protein secondary structures. This system, demonstrated in Figure 2, describes all presently known fibril topologies and is extensible to others not yet found.

Figure 2. Topology of all amyloid structures can be described using a simple network framework. (A) Fundamental fibril forms: the nribbon and the n-prism. These fundamental forms are the basis for describing any fibril by either adding (chording) or deleting (nulling) edges between nodes in a repeating pattern. (B) Various chording operations to the 2-ribbon. Chords are indexed by the subunits they connect, for example, consecutive chorded subunits are labeled 1,2, while subunits that are two positions apart are labeled 1 and 3. Cis and trans indicate whether chords are between subunits occupying equivalent or different embedded 1-ribbon “backbones,” respectively.

The second motivation driving the present work is the need for models of fibril formation that are able to represent the diversity of observed structures and to capture fibrillization kinetics on experimentally relevant time scales (many hours) and system sizes (hundreds to thousands of peptide monomers). Because both the number of degrees of freedom and the time scales of these events extend far beyond the reach 5453

DOI: 10.1021/acs.jpcb.9b03494 J. Phys. Chem. B 2019, 123, 5452−5462

Article

The Journal of Physical Chemistry B coarse-grained, it has some advantages over (and can be used in conjunction with) more fine-grained approaches. Directly modeling aggregation states via topology dramatically reduces the degrees of freedom that must be explicitly represented, allowing substantial gains in computational efficiency versus geometric representations. Modeling aggregation states in this fashion requires a different approach from, for example, atomistic methods. The abovementioned ERGM framework provides a parsimonious means of parameterizing and simulating draws from probability distributions on graphs, in this case, representing patterns of connectivity between protein monomers. Here, we exploit a property of this framework that allows us to relate the ERGM specification to a Boltzmann distribution over aggregation states (eq 5). This correspondence also provides a basis for defining Hamiltonian functions that describe the energy of formation for all graph theoretic features sufficient for recapitulating a particular aggregation state (eq 7). Finally, we employ the graph Hamiltonian to define a family of kinetic models whose equilibrium distributions correspond to a target ERGM distribution and which, we show, is able to qualitatively recapitulate behaviors seen in experimental studies. Further details regarding the mapping of aggregation states to graphs and the statistics describing their formation to ERGMs are provided below. Identifying Amyloid Fibril Structures from the Protein Data Bank. To examine the diversity of fibril topologies observed to date, we performed an exhaustive search of the PDB for all amyloid fibril structures. Although the PDB is not a random sample from nature, it is a reasonable census of the known fibril structures so far discovered by the scientific community. The fibril structures were downloaded from the Protein Data Bank (PDB; http://www.rcsb.org/pdb/ )13 after running multiple advanced searches on the website. The search criteria included finding fibrils of sufficient size that the repeating subunit could be clearly identified. Search terms used included “fibril,” “A-β,” “protein fibril,” and “lysozyme.” Any fibril attached to a ligand, metal, or macroscaffold was discarded. To be retained for analysis, a putative fibril structure was required to display (1) the cross-β sheet structure definitive of amyloid fibrils (distances between adjacent βstrands of 4.7 ± 0.4 Å with 10 ± 3 Å between β-sheets)15 and (2) a periodic pattern of connectivity between monomers along a single fibril growth axis. Structures not satisfying these conditions were discarded. In addition to the summary of our classification results shown in Figure 3, a detailed list of structures satisfying our criteria is given in Table S3. Extracting Topology from Atomistic Protein Structures. After identifying an amyloid fibril structure, we then extract its underlying topology (i.e., the pattern of bound interactions among monomers). For structures derived from Xray crystallography, this requires distinguishing between crystal contacts and the much stronger interactions that define the fibril structure itself; in the case of NMR and electron microscopy (EM) structures, the corresponding problem is distinguishing ephemeral and dynamically unstable contacts from structural ones. In both cases, we employ an energy scoring protocol to remove spurious contacts. Although we use the same scoring scheme for all structures, the definition of crystal structures in terms of a repeating asymmetric unit requires special processing. This is performed as follows: To ensure that we are working with a collection of fibril segments, we first generate supercells consisting of

Figure 3. Five unique amyloid fibril topologies found in the PDB, sorted by relative complexity. Bar heights indicate the number of structures in the PDB with the indicated topology.

multiple repeats of the reported crystal unit cell.16 We then calculate the score for the total interaction energy between each spatially adjacent monomer in the supercell (using the approach described below), using a −10 kcal/mol cutoff to filter out spurious contacts. If this yields a periodic fibril structure along a single axis (and not, e.g., a three-dimensional, sheet-like, or nonperiodic structure), the resulting ties are taken to define the fibril topology. If this does not yield such a structure, then, we lower the energy cutoff until either such a structure is observed or until the structure decomposes into independent monomers; in the latter case, the structure is considered not to meet our criteria of being composed of a repeating one-dimensional pattern of bound monomers and is rejected. Although the focus of the current work is fibril units held together by strong interactions, we note that this methodology can be extended to more complex treatments in which valued graphs are used to represent strong and weak interactions. In this case, multiple scoring thresholds would be used to classify edges into their respective classes. Because the NMR and EM structures employed here were not obtained from crystallized protein, we do not need to build supercells prior to scoring intermonomeric interactions. Instead, we simply score contacts among spatially adjacent monomers within the structure, as before beginning with a −10 kcal/mol cutoff and lowering the energy threshold until a periodic structure is obtained. For the NMR and EM structures examined here, it was not necessary to lower the default cutoff in order to obtain periodic structures. Binding Energy Scoring Protocol for Edge Determination. In order to quantitatively determine which monomer nodes share an edge in the graph representation of an aggregation state, it was necessary to establish a basic protocol for calculating a binding energy score between all adjacent monomers in a given aggregation state structure. The routine we implemented calculates an energy score for all dyads with one or more atom pairs within a 3 Å radius, using the ff14SB force field and IGB1 implicit solvent model found in AmberTools 16.17 (Although a variety of alternative implicit solvent models are available, the IGB1 model was chosen here because it has been more widely used and extensively tested over a range of applications than other models.) The scoring function implemented in our calculations is as follows Fij = ΔΔGij − ΔΔGi − ΔΔGj 5454

(1) DOI: 10.1021/acs.jpcb.9b03494 J. Phys. Chem. B 2019, 123, 5452−5462

Article

The Journal of Physical Chemistry B

. is here the set of all simple graphs on N vertices subject to the packing constraint that no vertex has more than 12 neighbors; while the packing constraint (based on standard geometric approximations20) can easily be relaxed, and we find that this has a little impact on the models studied here. From a physical point of view, we can immediately recognize eq 5 to be a Boltzmann distribution on . , with the familiar exponent −/(g )/(kBT ) = θ Tt(g ) with T being temperature, kB being the Boltzmann constant, and log h(g) being the entropy of g. Because the θ representation of the probability mass function of G involves temperature-dependent parameters, it is more physically useful to consider the alternative

where i and j are monomers, Sij is the energy score for the i, j dyad, and the ΔΔG values are approximate energy differences defined as follows. Let S be the entire structure, with Si being the structure with monomer i removed, and Sij being the structure with both i and j removed. We define a “baseline” energy for the structure as ΔGS = ES + ΔGsolveS

(2)

where ES is the internal energy of S, and ΔGsolveS is the solvation energy of S. Corresponding quantities for Si and Sij are defined analogously. We then approximate the change in energy associated with removing the ith monomer from the structure by ΔΔGi = ΔGSi − ΔGS

Pr(G = g |ϕ , t , T ) = exp[−/(g )/(kBT ) − log Z(ϕ)]h(g )

(3)

(6)

and the corresponding change in energy associated with removing the ith and jth monomers jointly by

where we express the Hamiltonian in terms of temperatureindependent parameters ϕ as /(g ) = ϕTt(g )T , and Z(ϕ) = ∑ g ′∈ . exp[−/(g ′)/(kBT )]h(g ′) is the partition

ΔΔGij = ΔGSij − ΔGS

(4)

function for the ensemble defined by (T , .). (In terms of this parameterization, we can recover the canonical ERGM parameters as θ = −ϕ/(kBT).) The sufficient statistics, t, can be recognized as the energy terms within the graph Hamiltonian, each expressing a network feature that contributes to the total energy of the aggregation state. Examples of such network features are described in the following section and are also shown graphically in Figure S1. Alternately, it is also possible to interpret θ directly in terms of a vector of “biases” on the distribution of the elements of t within the aggregation graph. In particular, the expectation of ti(G) is monotone increasing in θi (all other terms held constant), and hence a negative θi value for a statistic that counts, for example, two-stars represent a force inhibiting twostar formation. Such a topological two-star inhibiting force would represent the aggregate energetic cost of the conformational changes required to sustain many bonds at once. In this way, the θ representation of the ERGM distribution is analogous to the thermodynamic β representation of a conventional statistical mechanical system (up to a change of sign) and may likewise be more intuitive than the energetic, ϕ representation in some cases. Below, we demonstrate that a simple class of models defined by a small number of graph-theoretic terms (i.e., t and ϕ) is sufficient to reproduce the fibril types associated with all currently solved structures (while also predicting the possibility of novel structures not yet experimentally observed). The terms employed here are summarized as follows (see the Supporting Information for additional details). Hamiltonian for Aggregating Monomers. We account for the baseline bond energy of an aggregation state graph g by a term for the edge count, te(g). Once one monomer becomes bound to another, binding to additional partners may involve bending or other conformational changes, increasing the energetic cost of further bonds. A natural first-order approximation to this cost is an energy term that adds an adjustment to the cost of a new bond proportional to the number of bonds that each interacting monomer already has; graph-theoretically, this amounts to counting the number of two stars, t2s, that is, subgraphs involving a vertex adjacent to two others within G. The energy associated with bending pairs of bound monomers is also reflected in other local topological features. These include the formation of open 2-paths (a structure comprising three monomers where a central node is

We can thus interpret Fij as, approximately, the energy associated with desolvation of the interface between i and j, plus the energy associated with the direct interactions between i and j. Energies associated with either the direct interaction of i or j with other monomers, or of (de)solvation of surfaces not involving the i, j interface, are canceled out and should not contribute to Fij. Note that, as we generally have access only to a single conformation, we do not consider additional entropic nor kinetic contributions to the binding energy. This approximation is adequate for scoring strong connections between monomers bound within amyloid fibrils, where enthalpic terms are of primary interest. In practice, ΔGsolveS is difficult to calculate, especially on a large system; here, we must moreover perform the calculation many times. Because our focus is on scoring approximate energies to identify the strongly favorable interactions that characterize fibril structures, we employ single-point energy approximations using simple implicit solvent models. Preliminary investigation has shown that the basic generalized Born solvent model supported by sander18 works very well; it is the most extensively tested of the models available in AmberTools, and it yields scores that correspond extremely well to conclusions based on structural studies (in particular, for smaller peptides such as the monomers studied in this work19). Statistical Mechanics of the Aggregation Graph. In any fibrillization event, we have N protein monomers in aqueous solution at temperature T that spontaneously selfassemble into a fibril. In our topological representation, this corresponds to the generation of an aggregation structure with a fibrillar topology. Let G represent the aggregation graph describing this structure, with each vertex being a single monomer and edges joining monomers that are noncovalently bound to one another. In equilibrium, we may describe the probability distribution governing G in the ERGM form14 as Pr(G = g |θ , t ) =

exp(θ Tt(g )) h(g ) ∑ g ′∈ . exp(θ Tt(g ′))h(g ′)

(5)

where g is a particular graph state drawn from the set of potentially observable graph states . , is a vector of sufficient statistics encoding graph properties, θ ∈ p is a vector of parameters, T is the transpose operator, and h is the reference measure for the distribution. Given N monomers, 5455

DOI: 10.1021/acs.jpcb.9b03494 J. Phys. Chem. B 2019, 123, 5452−5462

Article

The Journal of Physical Chemistry B

Figure 4. Network models of amyloid fibrils recapitulate all of the fibril topologies observed in the PDB using only a small number of parameters. Panels A−E show graph representations for typical realizations of the MCMC simulations (200 for each of the five fibril types), where the parameters ϕ were tuned to generate 1-ribbons (A), 2-ribbons (B), 1,2 2-ribbons (C), double 1,2 2-ribbons (D), and 3-prisms (E). Monomers occupying locally fibrillar configurations are shown in color. The box plots in panel F show the distribution of fibril yield across realizations. All five models consistently produce mixtures of large, fibrillar components and small, structured oligomers.

where the state is a union of a series of exchangable microstates. Naively treating each labeled graph as a microstate would lead to h(g) = 1; however, this neglects the contribution of unobserved degrees of freedom associated with collisions between monomers, which have a non-trivial effect on the entropy of the system (see the Supporting Information, section S2). A simple model in which intermolecular bonds can form only between spatially proximate monomers that are well mixed on the time scale of bond formation leads to the reference measure h(g) = N−te(g) at constant concentration. This ensures that the mean number of bonds per monomer will not be affected by system size in the large-N, constant concentration limit. In other words, it constrains the degree of aggregation to be asymptotically invariant to the volume of sample being simulated, an obvious physical requirement. As shown in Figure 4, we have identified choices of ϕ that can successfully recapitulate all currently known amyloid fibril topologies in the PDB; other values of ϕ predict novel topologies not yet experimentally observed (Figure S3), as well as other, nonfibrillar, aggregation states. By specifying different statistics (t) and/or parameter vectors, our framework can be used to model other aggregation patterns; it is also possible to take ϕ or θ to be functions of other variables, leading to curved exponential families.22 In particular, we conjecture that any fibril topology can be generated by appropriate choices of (t, ϕ). Parameters employed in the fibrillization simulations shown in this paper are provided in Table S1. Both the ϕ form and corresponding θ form at 310 K are listed along with the corresponding fibril or other aggregation state type. The listed parameter vectors constitute illustrative points from the regions in the parameter space that reproduce each respective fibril type (as found by a combination of optimization and manual search); however, other parameter values can also generate fibril structures, and those given here should hence be taken as examples of what the model family can produce, rather than as an exhaustive list.

connected to two nodes that are not adjacent to each other), conjoined 2-paths (forming chordless, diamond-like structures), bound pairs not embedded in triangles, bound pairs embedded in exactly one triangle, and monomers whose adjacencies form ring-like closed paths called cycles. Following standard nomenclature in the ERGM literature,21 we refer to the associated statistics as the number of single null-shared partners tNSP1, double null-shared partners tNSP2, edges without shared partners tESP0, edges with a single shared partner tESP1, and cycles of size 5, 6, and 7: tC5, tC6, and tC7, respectively (Figure S1). Jointly, we then approximate the system energy associated with topological degrees of freedom in g by the nine-parameter model ϕete(g) + ϕ2st2s(g) + ϕNSP1tNSP1(g) + ϕNSP2tNSP2(g) + ϕESP0tESP0(g) + ϕESP1tESP1(g) + ϕC5tC5(g) + ϕC6tC6(g) + ϕC7tC7(g). Although we assume that fibrillization occurs slowly enough that motional degrees of freedom are decoupled from the graph topology (see the Supporting Information), vibrational energy makes an additional contribution to the intermolecular bond energy that must be considered in order to obtain reasonable temperature scaling. Approximating each bond as a classical oscillator assumed to be in equilibrium on the time scale of fibrillization kinetics and whose internal energy is unrelated to topology, and the equipartition theorem suggests an approximate total energy contribution equal to kBTte(g). The final Hamiltonian is then /(g ) = (ϕe + kBT )te(g ) + ϕ2st 2s(g ) + ϕNSP1t NSP1(g ) + ϕNSP2t NSP2(g ) + ϕESP0t ESP0(g ) + ϕESP1t ESP1(g ) + ϕC5tC5(g ) + ϕC6tC6(g ) + ϕC7tC7(g ) (7)

The last element remaining in our model specification is the reference measure, h. The reference measure defines the baseline distribution on the support in the absence of energetic effects: it is equivalent to state multiplicity under conditions 5456

DOI: 10.1021/acs.jpcb.9b03494 J. Phys. Chem. B 2019, 123, 5452−5462

Article

The Journal of Physical Chemistry B

Let δ = 0, G(0) = g0,τ(0) = 0, i = 1 Draw d ∼ Exp(RG(0)+) Set δ = δ + d While δ ≤ δmax Set τ(i) = δ Draw G(i) from 5G(i−1) with probability PG(i−1)j for j ∈ 5G(i−1) Draw d ∼ Exp(RG(i)+) Set δ = δ + d Set i = i + 1 Return G,τ G(0),G(1),... is then the ordered trajectory of graph states from time 0 to δmax, and τ(0),τ(1),... is the corresponding vector of transition times. For finite ϕ, T, we have that Ri+ > 0, Pij > 0 for all i, j ∈ 5i , and because . is Hamming-connected, it follows that the system is irreducible. It thus has an equilibrium distribution, which follows eq 6. The kinetic model hence recapitulates its associated ERGM in equilibrium, although it generates continuous time trajectories with locally realistic behavior based on the free energy surface implied by / and h.

Kinetic Extension of the Fibrillization ERGM. We obtain a natural kinetic extension of the ERGM framework by adapting a common model of reaction kinetics. Let us begin with a simple two-state example, where we are interested in transitions between states i and j, and for convenience, let β = 1/(kBT), /j = /(j), and sj = log h(j). If we make the standard assumption of Boltzmann statistics at equilibrium that the time-independent probability of finding our system in state j is Pj ∝ exp( − β /j + sj), then the conditional probability that we will find the system in state j, given that it is in either state i or j is Pj Pi + Pj

=

1 1 + exp[β(/j − /i) + si − sj]

= [1 + exp[β Δij/ − ΔSij ]]−1

where Δij/ represents the difference in energy going from state i to state j, and ΔSij represents the corresponding difference in the log reference measure (entropy). From this unitless expression, we can obtain an equation for the conditional rate of formation of state j by multiplying by some event frequency A of units events per time rij =

A 1 + exp[β Δij/ − ΔSij ]



RESULTS AND DISCUSSION I: A TYPOLOGY OF FIBRIL TOPOLOGY Systematic Nomenclature for Fibril Topology. In the study of individual protein structures, the paradigm of the secondary structure has provided researchers with a tool for concisely describing structural details of proteins based on commonly observed hydrogen bonding patterns. Here, we present an analogous, systematic, and standardized nomenclature system for fibril topologies that encompasses all of the diverse amyloid fibril forms currently represented in the PDB and which can straightforwardly be extended to describe forms yet to be discovered (Figure 2). The goal of the present study is to develop formalism for describing amyloid fibrils purely in network terms, using connectivity among monomers. Although it differs in the details, the general approach is a part of the rich tradition in computational chemistry of adopting coarsegrained models that capture the most salient features of the system of interest, while discarding other details that are judged to add complexity but not essential information.23,24 Such coarse-grained representations of protein systems often necessitate use of a specialized force field, whereby the coarsegrained mapping is performed in the most literal sense, that is, atomistic degrees of freedom deemed to be unnecessary for capturing the phenomena of interest are fused together to create the “coarse grains” that compose less computationally expensive models with fewer degrees of freedom.25,26 We address this objective below. Amyloid fibrils are characterized by interesting structural features at multiple scales from the atomistic details of the molecular structures all the way to the distribution of plaques in the brain.27 The former has been the subject of considerable experimental investigation, leading to details about the shapes of individual monomers within both parallel and antiparallel assemblies. Types of fibrils include S-5 (or tilde)28 and doublehorseshoe29 shapes, along with more complicated supramolecular assemblies. Our particular coarse-grained model focuses on fibril topology at the level of interacting monomers, thus opening the door to similar investigations at a higher level of the structure, including potential relationships with disease etiology (to which other features are hypothesized to be related30) and connections between atomistic and topological

. (8)

Note that if Δij/ is treated as an activation energy barrier Ea in going from state i to j, the barrier is both “uphill,” that is positive, and large enough that exp[β Δij/ − ΔSij ] ≫ 1 and ΔSij ≈ 0, eq 8 reduces to the familiar Arrhenius law for calculating chemical rate constants kij = Ae−βEa

while a “downhill” change for βΔij/ − ΔijS , that is a negative free energy difference, causes the rate of j formation in eq 8 to approach the event frequency, often called the collision frequency in the parlance of chemical kinetics. This represents the fact that movement between states is limited by the rate of microscopy interactions within the system. In the case of the aggregation graph, one can instantaneously transition from state i to j if and only if i and j differ by exactly one bond. Moreover, if the graph is currently in state i, all allowed i → j transitions represent competing reactions (the first to occur determining the next state of the aggregation graph). Following eq 8, if 5i represents the set of all Hamming neighbors of i (i.e., the graphs that can be formed by adding or removing a single edge from graph state i), then it follows immediately that

R i+ =

∑ rij j ∈ 5i

is the exit rate for state i and Pij = rij/R i +

is the probability that the next realized transition is from state i to state j, given that the system is currently in state i. Moreover, the time to transition out of state i will be exponentially distributed with expectation 1/Ri+. This suggests a straightforward simulation algorithm: 5457

DOI: 10.1021/acs.jpcb.9b03494 J. Phys. Chem. B 2019, 123, 5452−5462

Article

The Journal of Physical Chemistry B

these known amyloid fibril topologies, as described in the following sections.

features. Because all fibrils, by definition, have a unique axis along which fibril growth occurs, we first posit that all fibrils are chains of repeating subunits that are noncovalently bonded end to end. Each unique fibril topology is identified by this onedimensional repeating pattern of linked subgraphs, as summarized in Figure 2. Our convention for visually indicating the axis of fibril growth in the graphs is to depict “stubs” (i.e., endpoints of cross-unit edges) at both ends of the fibril segments shown. In all examples, the repeating pattern of nodes, edges, and stubs that defines the repeating unit of the fibril form is enclosed with dotted lines. The subunit topologies observed to date can be divided into two categories: toroidal (i.e., cyclic) and linear. If the subunit is linear, we call it an n-ribbon, and if it is toroidal, we call it an nprism (see Figure 2). For example, if a subunit comprises three monomers i, j, and k, and the edges they share within the subunit can be represented as {i ↔ j, j ↔ k}, the fibril type is a 3-ribbon. On the other hand, if the set of edges between monomers i, j, and k in a single subunit is {i ↔ j, j ↔ k, k ↔ i}, the fibril type is a 3-prism. The simplest case, where the repeating subunit is a single monomer, is called a 1-ribbon. We can further extend the naming convention to describe ties between monomers from different unit cells. We introduce a convention for describing these additional ties of the form i, j, where i and j give the indices of the subunits between which the ties occur. Additionally, if the ties between unit cells connect nodes along the same embedded (“backbone”) 1ribbon, we add the term cis, while the term trans indicates that the tie is between nodes on different ribbon segments. For example, if we have a 2-ribbon, where an additional tie connects each unit cell to its immediate neighbor (a 1,2 tie), and the tie is between nodes that belong to the opposite constituent 1-ribbons that make up the 2-ribbon (trans), the 2ribbon becomes a trans-1,2 2-ribbon (see Figure 2). More examples are provided in Figure 2, and a detailed specification of the fibril type nomenclature is included in the Supporting Information. Characterizing Fibril Topologies Observed in Nature. In order to demonstrate that our system of nomenclature is general enough to be useful, we have categorized the complete set of amyloid fibrils in the Protein Data Bank by topology (Figure 3). The amyloid fibril type classification follows a twostep protocol (see the Methods). First, we confirm that the PDB entry possesses the characteristic cross-β sheet structure definitive of amyloid fibrils (i.e., distances of 4.7 ± 0.4 Å between strands within β-sheets and 10 ± 3 Å across βsheets15 ). We carry out this step using the distance measurement tool in PyMOL.16 Second, we apply an energy scoring criterion to identify monomer pairs whose interactions are sufficiently strong to constitute an edge in the aggregation graph. The interaction energy score between two monomers is calculated in a style similar to free energy difference scoring commonly practiced in molecular docking simulations. A conservative minimum energetic threshold of −10 kcal/mol is employed to ensure that the intermonomeric interactions included in the graph representation of the fibril are at least an order of magnitude more enthalpically favorable than a typical hydrogen bond between water molecules. For the 28 amyloid fibril structures solved to date that meet our inclusion criteria, we find five unique topological classes (Figure 3). We have identified simulation parameters for our mathematical model of fibril formation that are capable of recapitulating all five of



RESULTS AND DISCUSSION II: NETWORK-BASED MODELS OF FIBRIL FORMATION Recapitulating Fibril Structure in Equilibrium. One of the advantages of the topological representation is the ability to leverage scalable network-analytic modeling techniques to study both the equilibrium fibril structure and fibril formation kinetics. Our methods are adapted from ERGM techniques originally developed for the study of social networks.14,31 The ERGM class is a natural starting point for modeling the network structures of fibrils in equilibrium: the methodology has been well established for modeling other forms of networks,14,31 and it also admits a straightforward statistical mechanical interpretation, enabling comparisons to be made between simulation and experimental data. In addition to our implementation of equilibrium ERGM simulations, we also introduce a simple dynamic extension of our ERGM family for examining fibril formation kinetics in a manner that preserves the equilibrium behavior of the corresponding trajectories. In order to construct a purely topological physical model of protein aggregation events, it is necessary to define a Hamiltonian function whose input states are networks. The terms of the Hamiltonian then correspond to energetic gains or penalties due to specific patterns of noncovalent interactions between monomers or bonding motifs. One such Hamiltonian, which we have shown can recapitulate fibril formation for all five currently known topologies in the PDB, can be written as eq 7: , /(g ) = (ϕe + kBT )te(g ) + ϕ2st 2s(g ) + ϕNSP1t NSP1(g ) + ϕNSP2t NSP2(g ) + ϕESP0t ESP0(g ) + ϕESP1t ESP1(g ) + ϕC5tC5(g ) + ϕC6tC6(g ) + ϕC7tC7(g ) where g is the graph indicating which monomers share noncovalent bonds, kB is Boltzmann’s constant, T is temperature, the t(g) functions return the number of instances that a particular bonding motif appears in graph g, and the ϕ values are the signed coefficients that scale a given bonding motif’s contribution to the enthalpy of the aggregation state. A detailed description of this Hamiltonian, including details regarding the various noncovalent bonding motifs indicated by the subscripts of t(g) and ϕ, is given in the Methods section and Figure S1 in the Supporting Information. Given an ERGM form for the distribution of aggregation graphs, equilibrium draws can be simulated by MCMC. MCMC simulations were performed for several different choices of ϕ tuned to generate fibril structures. The Metropolis−Hastings TNT (Tie No Tie) algorithm in the ergm software package for R32 was used, with a fully monomeric solution (an empty graph, where no ties exist) employed as the initial state. Simulated systems contained 1000 protein monomers, and 200 independent draws were taken in each condition (one full chain per draw). For each realization, residues were classified as being part of a locally fibrillar configuration if their local topological position matched that of a monomer in the repeating unit of the corresponding fibril form from Figure 3. Simulation results from parameter choices recapitulating all known fibril types are shown in Figure 4. Stability was further verified by ensuring that fibril structures persisted in replicate simulations initialized with completed fibrils (see Supporting Information, Table S2). 5458

DOI: 10.1021/acs.jpcb.9b03494 J. Phys. Chem. B 2019, 123, 5452−5462

Article

The Journal of Physical Chemistry B

emergent properties of our two-ribbon fibrillization model, and could in principle be modeled using multistep kinetic models. In our kinetic sampling algorithm (see the Methods), the reaction rates of all possible bond formation and breaking events given the current state are calculated as a function of the difference in free energy corresponding to the change, with state changes (bond formation/breaking) occurring as events in continuous time following the specified reaction rates. The equilibrium behavior of this dynamic process converges to the ERGM equilibrium described above. An intuitive way to think about our kinetic sampling algorithm is that at every step of the simulation, all possible single-edge changes in the graph (i.e., Hamming steps) are competing reactions governed by Arrhenius-like kinetics, each with its respective activation energy. Hamming moves with the lowest activation energies, that is, the fastest kinetics, are most likely to take place before the slower moves with higher barriers, hence our weighting scheme places the highest probabilities on the moves with the lowest activation energies. Thus, at each step of our stochastic kinetic sampling algorithm, a Hamming move is first selected with probabilities weighted according to each activation energy barrier, and the amount of time elapsed for that move is drawn from the distribution characterized by the competing reaction rates. Our kinetic model was employed to simulate an ensemble of 300 independent trajectories of a system of 200 protein monomers, where the parameters ϕ were set to the same values that consistently produce 2-ribbons in MCMC simulations. All trajectories were initialized with empty graphs (i.e., solutions of noninteracting monomers) at 310 K. Trajectories were postprocessed using the ergm.graphlets software package48 to identify vertices occupying locally fibrillar configurations. In order to map the results from simulation time to a physical time scale, the normalized mean fibrillar fraction function (i.e., fraction of monomers in fibrillar configuration, relative to the maximum observed) was fit to an experimental measurement of this same observable by least squares (see Figure 7c of ref 49). Examination of the simulated trajectories yields unique insights into the complexity of the 2-ribbon fibrillization process, including early phases so far poorly characterized by the experiment. To simplify analysis, we delineate the time evolution of fibrillization by dividing the timeline into epochs, with each epoch marked by the appearance of a particular defining topological feature (Figure 5). Although an excellent qualitative understanding of what transpires during each epoch can be gained from Figure 5 (as well as from the video to which a link is given in the Supporting Information), a detailed description of how the epochs are quantitatively defined can be found in the Supporting Information. As Figure 5 shows, the process of network self-assembly that gives rise to the 2-ribbon is complex, with many intermediate stages between the initial monomeric state and the mature fibril. Early stages of selfassembly qualitatively resemble trajectories seen in the evolution of random graphs,50 with the locally fibrillar structure emerging only in the relatively high-density environment provided by the unstructured aggregate. Subsequent assembly of the fibril itself is consistent with a nucleationelongation process51 in which the growth faces of the emerging fibril act as catalytic surfaces for fibril growth. Structured oligomers emerge late in the formation process, largely as remnants of the unstructured aggregate that achieve a locally stable configuration and resist being absorbed into the larger

Panels A−E in Figure 4 show examples of typical realizations of aggregate graphs recapitulating 1-ribbon, 2-ribbon, 1,2 2ribbon, double 1,2 2-ribbon, and 3-prism fibrils (respectively). All five models reliably produce a mixture of large fibrillar components and small oligomers; these oligomers tend to be highly structured, and are often annular in form, closely resembling certain species of toxic oligomers observed in experimental studies.33 Panel F shows box plots indicating fibril yield for each model, defined as the fraction of monomers found to be in locally fibrillar configurations, across simulated draws. All models shown here consistently produce fibrils of their respective types. We note that other parameter values can lead to models that remain monomeric, produce only small oligomers or structured aggregates, or in some cases, predict novel fibril forms not yet observed in PDB structures (see Figure S3 in the Supporting Information section). As seen for the five models shown above, our fibrilgenerating aggregation simulations frequently result in large fibrillar components coexisting with small oligomeric components at equilibrium; this suggests that the equilibrium state for fibrillizing protein monomers may be a mixture of fibrils and oligomers, a proposition that is experimentally testable. Models of this type have the potential to guide experimental studies on the relationship between fibrils and oligomers during formation and at equilibrium, which is highly relevant to disease states, as some researchers argue that structured oligomers are the most toxic species in many diseases;34 and the fibril type may also be a predictor of toxicity,35 making the ability of this framework to recapitulate a range of fibril structures a potentially useful asset. Fibrilization Kinetics Simulations. Understanding the kinetics of amyloid fibril formation and the mechanisms driving their self-assembly have been established as important goals along the path toward discovering the link between amyloid fibrils and the diseases in which they are implicated.3,36,37 To that end, we also propose a kinetic model of fibril formation that can recapitulate not only the equilibrium behavior discussed above but also the formation process itself, utilizing a single Hamiltonian. A number of previous approaches to modeling of fibrillization kinetics have been based on a “dock-lock” model,38−41 where protein monomers first bind to the growing fibril and then settle into a low-energy state once bound. This view is consistent with the experimental work by Cannon et al.,42 Cohen et al.,43 Bernstein et al.,44 and Cohen et al.,45 which has suggested that amyloid fibril growth is governed by at least two distinct kinetic processes. Simulation strategies for fibrillization often grapple with the problem of capturing this complex process by employing separate models for the early stages of aggregation, where monomers interact to form oligomers, and the later elongation phases. A common approach is to treat the early and late stages of aggregation with different levels of theory, using all-atom simulation for the monomers and small oligomers that dominate at the beginning of the process and coarse-grained strategies for the elongation phase.46,47 By contrast, we capture the entire fibrillization process, from fully dissociated monomers to fully assembled fibrils, in a single model. In our framework, the multistep fibrillization process described above arises as an emergent outcome of the behavior of the evolving aggregation graph (with the many-body energy terms within the network Hamiltonian generating a “locking” mechanism that comes into play once initial contacts are formed among monomers). For instance, Figure S4 in the Supporting Information displays five distinct epochs that are 5459

DOI: 10.1021/acs.jpcb.9b03494 J. Phys. Chem. B 2019, 123, 5452−5462

Article

The Journal of Physical Chemistry B

fibrillization event involving thousands of monomers. Likewise, by employing a dynamic extension of the ERGM framework, we can probe the kinetics of fibril self-assembly for an initial monomeric sample whose behavior is governed by the same Hamiltonian. These simulations not only reproduce all known fibril forms currently found in the PDB but also recapitulate the complete self-assembly of fibrils from a purely monomeric state. Under appropriate conditions, the model also produces fibril forms not yet observed in experimental studies, leading to predictions for other potential amyloid structures yet to be discovered. Although it is in principle possible to simulate fibrillization using detailed interaction potentials for all of the atomic degrees of freedom of each monomer, the fact that each monomer is composed of hundreds if not thousands of atoms, and that fibrillization takes place on the time scale of minutes, days, or even years, means that such a treatment is computationally intractable for systems of more than a few monomers. In response to this inherent computational limitation, some researchers favor a bottom-up approach to building fibrillization models where the physics believed to be driving the dynamics of the individual monomers is coarsely approximated, and the fibrils are an emergent structure from the parameters of their model of monomer dynamics.57 In contrast, we use a top-down strategy; after directly simulating the formation of fibril topology using ERGMs, we extract a statistical mechanical description of the interprotein interactions once our model has recapitulated the observed structure. Our approach is complementary to bottom-up methods in that it can produce high-level models of interaction behavior that can be targeted by detailed, atomistic models (freeing them from the computational burden of simulating the entire fibrillization process at scale). Finally, we have leveraged the isomorphism between constructing normalizing constants in ERGMs and partition functions in statistical mechanics such that, in parameterizing them, we directly obtain a statistical mechanical description of the system, rather than having to develop a statistical mechanical description of the system post hoc, as is typically necessary in bottom-up approaches. In contrast to highly detailed but computationally expensive models based on molecular mechanics, our parsimonious models, which are capable of reproducing realistic fibril topologies with just a few parameters, are orders of magnitude faster to simulate. These simulations also offer a novel tool for the study of fibrillization kinetics. In contrast to the frequently cited hypothesis that fibrillization proceeds via an orderly progression from oligomers to fibrils, kinetic simulations of 2ribbon fibrils show small dendritic aggregates making a fleeting appearance before annealing into a large disordered aggregate, with annular oligomers only manifesting after the formation of true fibrillar segments begins. While experimental evidence suggests that the simpler progression can occur,58 our model demonstrates that there are other potential pathways for at least some fibril forming systems, consistent with the idea that a variety of chemical species are formed during the lag phase of fibrillization kinetics.59 Such details as the ordering of distinct aggregation states, protofibril and oligomer size distributions, and absolute fibril yield are valuable experimental targets that can be obtained from our simulation methodology and which may prove useful not only for experimental calibration of model parameters but also for discriminating among competing fibril models (important, given the difficulty of

Figure 5. Kinetic models of fibril formation reveal information about aggregation pathways as well as final structures. Fibrillar fraction for the 2-ribbon model is shown as a function of time for 300 trajectories generated using the kinetic model; the central line shows mean fibrillar fraction, while the shaded area shows 95% simulation intervals. Inset graphs show typical configurations from six time points during the fibrillization process. Background shading indicates distinct epochs in the process, moving from states characterized by small dendritic oligomers to mixtures of complete fibrils and highly structured oligomers.

fibril structure. The different kinds of intermediates observed during the fibrillization process are consistent with available experimental data, although many details are system-specific. Pioneering atomic force microscopy work by Lansbury and coworkers showed branched oligomers coexisting with protofibrils and mature fibrils in aggregated Aβ samples,52 a process that was recently recapitulated using coarse-grained simulations.53 The fibrillization kinetics of the immunoglobulin light chain are also consistent with an off-pathway intermediate early in the process, although its morphology was not directly observed.54 Branched oligomers are observed by EM studies of the intermediates present in the fibrillization of glucagon,55 and in this system, the presence of branched oligomers speeds up the fibrillization kinetics.56 The distinct features that form during the simulated fibrillization process are potentially useful targets for experimental validation and may themselves suggest directions for interventions that inhibit fibrillization: for instance, measures that stabilize the small, dendritic oligomers seen early in the assembly process may prevent the emergence of the larger aggregates in which fibrillar structures first emerge, and those that destabilize these first locally fibrillar configurations may prevent these structures from surviving long enough to seed fibrillar growth.



CONCLUSIONS We have introduced a general and systematic nomenclature for describing the topology of amyloid fibrils, as well as a quantitative method for mapping atomic structures to topological representations, and used them to classify all known fibril structures in the Protein Data Bank. Our analysis reveals that all structures published to date fall into five distinct types, with two classes (the 1-ribbons and 1,2 2-ribbons) being substantially more common than the rest. We have also introduced a network-based coarse-grained Hamiltonian form for calculating the interaction energy between aggregating protein monomers and implemented it in both thermodynamic and kinetic stochastic simulations. Using MCMC techniques,32 we demonstrated that it is possible to capture the behavior of a 5460

DOI: 10.1021/acs.jpcb.9b03494 J. Phys. Chem. B 2019, 123, 5452−5462

Article

The Journal of Physical Chemistry B

in amyloid fibrils by solid-state NMR spectroscopy. Angew. Chem. 2009, 121, 2152−2155. (9) Wiltzius, J. J. W.; Sievers, S. A.; Sawaya, M. R.; Cascio, D.; Popov, D.; Riekel, C.; Eisenberg, D. Atomic structure of the cross-β spine of islet amyloid polypeptide (amylin). Protein Sci. 2008, 17, 1467−1474. (10) Apostol, M. I.; Wiltzius, J. J. W.; Sawaya, M. R.; Cascio, D.; Eisenberg, D. Atomic structures suggest determinants of transmission barriers in mammalian prion disease. Biochemistry 2011, 50, 2456− 2463. (11) Lu, J.-X.; Qiang, W.; Yau, W.-M.; Schwieters, C. D.; Meredith, S. C.; Tycko, R. Molecular structure of β-amyloid fibrils in Alzheimer’s disease brain tissue. Cell 2013, 154, 1257−1268. (12) Bousset, L.; Pieri, L.; Ruiz-Arlandis, G.; Gath, J.; Jensen, P. H.; Habenstein, B.; Madiona, K.; Olieric, V.; Böckmann, A.; Meier, B. H.; et al. Structural and functional characterization of two alpha-synuclein strains. Nat. Commun. 2013, 4, 2575. (13) Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. The Protein Data Bank. Nucleic Acids Res. 2000, 28, 235−242. (14) Lusher, D.; Koskinen, J.; Robins, G. Exponential Random Graph Models for Social Networks: Theory, Methods, and Applications; Cambridge University Press: Cambridge, 2012. (15) Serpell, L. C.; Sunde, M.; Blake, C. C. F. The molecular basis of amyloidosis. Cell. Mol. Life Sci. 1997, 53, 871−887. (16) Schrödinger, LLC. The PyMOL Molecular Graphics System, version 1.8, 2015. (17) Case, D. A.; Betz, R. M.; Cerutti, D. S.; Cheatham, III, T. E.; Darden, T. A.; Duke, R. E.; Giese, T. J.; Gohlke, H.; Goetz, A. W.; Homeyer, N.; et al. Amber 2016; University of California: San Francisco, 2016. (18) Tsui, V.; Case, D. A. Theory and applications of the generalized Born solvation model in macromolecular simulations. Biopolymers 2000, 56, 275−291. (19) Onufriev, A.; Bashford, D.; Case, D. A. Modification of the generalized Born model suitable for macromolecules. J. Phys. Chem. B 2000, 104, 3712−3720. (20) Hales, T. A proof of the Kepler conjecture. Ann. Math. 2005, 162, 1065−1185. (21) Snijders, T. A. B.; Pattison, P. E.; Robins, G. L.; Handcock, M. S. New specifications for exponential random graph models. Socio. Methodol. 2006, 36, 99−153. (22) Hunter, D. R.; Handcock, M. S. Inference in Curved Exponential Family Models for Networks. J. Comput. Graph. Stat. 2006, 15, 565−583. (23) Tozzini, V. Coarse-grained models for proteins. Curr. Opin. Struct. Biol. 2005, 15, 144−150. (24) Pak, A. J.; Voth, G. A. Advances in coarse-grained modeling of macromolecular complexes. Curr. Opin. Struct. Biol. 2018, 52, 119− 126. (25) Rojas, A.; Liwo, A.; Browne, D.; Scheraga, H. A. Mechanism of Fiber Assembly: Treatment of Aβ Peptide Aggregation with a CoarseGrained United-Residue Force Field. J. Mol. Biol. 2010, 404, 537− 552. (26) Zheng, W.; Tsai, M.-Y.; Wolynes, P. G. Comparing the aggregation free energy landscapes of amyloid beta(1-42) and amyloid beta(1-40). J. Am. Chem. Soc. 2017, 139, 16666−16676. (27) Roychaudhuri, R.; Yang, M.; Hoshi, M. M.; Teplow, D. B. Amyloid β-protein assembly and Alzheimer’s disease. J. Biol. Chem. 2009, 284, 4749−4753. (28) Schmidt, M.; Rohou, A.; Lasker, K.; Yadav, J. K.; SchieneFischer, C.; Fändrich, M.; Grigorieff, N. Peptide dimer structure in an Aβ(1-42) fibril visualized with cryo-EM. Proc. Natl. Acad. Sci. U.S.A. 2015, 112, 11858−11863. (29) Wälti, M. A.; Ravotti, F.; Arai, H.; Glabe, C. G.; Wall, J. S.; Böckmann, A.; Güntert, P.; Meier, B. H.; Riek, R. Atomic-resolution structure of a disease-relevant Aβ(1-42) amyloid fibril. Proc. Natl. Acad. Sci. U.S.A. 2016, 113, E4976−E4984.

amyloid structure determination). The aggregation graph representation introduced here, and the broader modeling strategy, may also have applications to nonamyloid forms of protein aggregation such as those observed in cataract.60,61



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jpcb.9b03494. Supplementary video; fibril nomenclature rules for chords; aggregation model reference measure; aggregation model behavior at extreme temperatures; extended stability testing of fibrils; definition of fibril epochs; sample code for 2-ribbon simulation; illustration of important sufficient statistics; parameters for all fibrillization network models; intermolecular binding energy cutoff illustration; other aggregation states our models can produce; epochs of fibril formation kinetics exhibited by our model; fibrillar structure produced by our sample code; and amyloid fibril topologies observed in the PDB (PDF) Fibril topology movie (ZIP)



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. ORCID

Gianmarc Grazioli: 0000-0003-2559-5103 Rachel W. Martin: 0000-0001-9996-7411 Notes

The authors declare no competing financial interest.

■ ■

ACKNOWLEDGMENTS This work was supported by NSF award DMS-1361425 to C.T.B. and R.W.M. REFERENCES

(1) Pulawski, W.; Ghoshdastider, U.; Andrisano, V.; Filipek, S. Ubiquitous amyloids. Appl. Biochem. Biotechnol. 2012, 166, 1626− 1643. (2) Chiti, F.; Dobson, C. M. Protein misfolding, amyloid formation, and human disease: A summary of progress over the last decade. Annu. Rev. Biochem. 2017, 86, 27−68. (3) Chiti, F.; Webster, P.; Taddei, N.; Clark, A.; Stefani, M.; Ramponi, G.; Dobson, C. M. Designing conditions for in vitro formation of amyloid protofilaments and fibrils. Proc. Natl. Acad. Sci. U.S.A. 1999, 96, 3590−3594. (4) Sunde, M.; Serpell, L. C.; Bartlam, M.; Fraser, P. E.; Pepys, M. B.; Blake, C. C. F. Common core structure of amyloid fibrils by synchrotron x-ray diffraction. J. Mol. Biol. 1997, 273, 729−739. (5) Colvin, M. T.; Silvers, R.; Ni, Q. Z.; Can, T. V.; Sergeyev, I.; Rosay, M.; Donovan, K. J.; Michael, B.; Wall, J.; Linse, S.; et al. Atomic Resolution Structure of Monomorphic Aβ42 Amyloid Fibrils. J. Am. Chem. Soc. 2016, 138, 9663−9674. (6) Luhrs, T.; Ritter, C.; Adrian, M.; Riek-Loher, D.; Bohrmann, B.; Dobeli, H.; Schubert, D.; Riek, R. 3D structure of Alzheimer’s amyloid- (1-42) fibrils. Proc. Natl. Acad. Sci. U.S.A. 2005, 102, 17342− 17347. (7) Qiang, W.; Yau, W.-M.; Luo, Y.; Mattson, M. P.; Tycko, R. Antiparallel β-sheet architecture in Iowa-mutant β-amyloid fibrils. Proc. Natl. Acad. Sci. U.S.A. 2012, 109, 4443−4448. (8) Nielsen, J. T.; Bjerring, M.; Jeppesen, M. D.; Pedersen, R. O.; Pedersen, J. M.; Hein, K. L.; Vosegaard, T.; Skrydstrup, T.; Otzen, D. E.; Nielsen, N. C. Unique identification of supramolecular structures 5461

DOI: 10.1021/acs.jpcb.9b03494 J. Phys. Chem. B 2019, 123, 5452−5462

Article

The Journal of Physical Chemistry B (30) Eisenberg, D. S.; Sawaya, M. R. Implications for Alzheimer’s disease of an atomic resolution structure of amyloid-β(1-42) fibrils. Proc. Natl. Acad. Sci. U.S.A. 2016, 113, 9398−9400. (31) Holland, P. W.; Leinhardt, S. An exponential family of probability distributions for directed graphs (with discussion). J. Am. Stat. Assoc. 1981, 76, 33−50. (32) Hunter, D. R.; Handcock, M. S.; Butts, C. T.; Goodreau, S. M.; Morris, M. ergm: A Package to Fit, Simulate and Diagnose Exponential-Family Models for Networks. J. Stat. Softw. 2008, 24, nihpa54860. (33) Kayed, R.; Pensalfini, A.; Margol, L.; Sokolov, Y.; Sarsoza, F.; Head, E.; Hall, J.; Glabe, C. Annular protofibrils are a structurally and functionally distinct type of amyloid oligomer. J. Biol. Chem. 2009, 284, 4230−4237. (34) Kayed, R.; Head, E.; Thompson, J. L.; McIntire, T. M.; Milton, S. C.; Cotman, C. W.; Glabe, C. G. Common structure of soluble amyloid oligomers implies common mechanism of pathogenesis. Science 2003, 300, 486−489. (35) Liu, C.; Zhao, M.; Jiang, L.; Cheng, P.-N.; Park, J.; Sawaya, M. R.; Pensalfini, A.; Gou, D.; Berk, A. J.; Glabe, C. G.; et al. Out-ofregister β-sheets suggest a pathway to toxic amyloid aggregates. Proc. Natl. Acad. Sci. U.S.A. 2012, 109, 20913−20918. (36) Gillam, J. E.; MacPhee, C. E. Modelling amyloid fibril formation kinetics: mechanisms of nucleation and growth. J. Phys.: Condens. Matter 2013, 25, 373101. (37) Schreck, J. S.; Yuan, J.-M. A kinetic study of amyloid formation: fibril growth and length distributions. J. Phys. Chem. B 2013, 117, 6574−6583. (38) Esler, W. P.; Stimson, E. R.; Jennings, J. M.; Vinters, H. V.; Ghilardi, J. R.; Lee, J. P.; Mantyh, P. W.; Maggio, J. E. Alzheimer’s Disease Amyloid Propagation by a Template-Dependent Dock-Lock Mechanism. Biochemistry 2000, 39, 6288−6295. (39) Bacci, M.; Vymětal, J.; Mihajlovic, M.; Caflisch, A.; Vitalis, A. Amyloid β Fibril Elongation by Monomers Involves Disorder at the Tip. J. Chem. Theory Comput. 2017, 13, 5117−5130. (40) Röder, K.; Wales, D. J. Energy Landscapes for the Aggregation of Aβ17-42. J. Am. Chem. Soc. 2018, 140, 4018−4027. (41) Schwierz, N.; Frost, C. V.; Geissler, P. L.; Zacharias, M. From Aβ Filament to Fibril: Molecular Mechanism of Surface-Activated Secondary Nucleation from All-Atom MD Simulations. J. Phys. Chem. B 2017, 121, 671−682. (42) Cannon, M. J.; Williams, A. D.; Wetzel, R.; Myszka, D. G. Kinetic analysis of beta-amyloid fibril elongation. Anal. Biochem. 2004, 328, 67−75. (43) Cohen, S. I. A.; Vendruscolo, M.; Dobson, C. M.; Knowles, T. P. J. From macroscopic measurements to microscopic mechanisms of protein aggregation. J. Mol. Biol. 2012, 421, 160−171. (44) Bernstein, S. L.; Dupuis, N. F.; Lazo, N. D.; Wyttenbach, T.; Condron, M. M.; Bitan, G.; Teplow, D. B.; Shea, J.-E.; Ruotolo, B. T.; Robinson, C. V.; et al. Amyloid-β protein oligomerization and the importance of tetramers and dodecamers in the aetiology of Alzheimer’s disease. Nat. Chem. 2009, 1, 326−331. (45) Cohen, S. I. A.; Linse, S.; Luheshi, L. M.; Hellstrand, E.; White, D. A.; Rajah, L.; Otzen, D. E.; Vendruscolo, M.; Dobson, C. M.; Knowles, T. P. J. Proliferation of amyloid-β42 aggregates occurs through a secondary nucleation mechanism. Proc. Natl. Acad. Sci. U.S.A. 2013, 110, 9758−9763. (46) Nasica-Labouze, J.; Nguyen, P. H.; Sterpone, F.; Berthoumieu, O.; Buchete, N.-V.; Coté, S.; De Simone, A.; Doig, A. J.; Faller, P.; Garcia, A.; et al. Amyloid β Protein and Alzheimer’s Disease: When Computer Simulations Complement Experimental Studies. Chem. Rev. 2015, 115, 3518−3563. (47) Nagel-Steger, L.; Owen, M. C.; Strodel, B. An account of amyloid oligomers: Facts and figures obtained from experiments and simulations. ChemBioChem 2016, 17, 657−676. (48) Yaveroğlu, Ö . N.; Fitzhugh, S. M.; Kurant, M.; Markopoulou, A.; Butts, C. T.; Pržulj, N. ergm.graphlets: A package for ERG modeling based on graphlet statistics. J. Stat. Softw. 2015, 65, 1−29.

(49) Xue, C.; Lin, T. Y.; Chang, D.; Guo, Z. Thioflavin T as an amyloid dye: fibril quantification, optimal concentration and effect on aggregation. R. Soc. Open Sci. 2017, 4, 160696. (50) Bollobás, B. The evolution of random graphs. Trans. Am. Math. Soc. 1984, 286, 257−274. (51) Frieden, C. Protein aggregation processes: in search of the mechanism. Protein Sci. 2007, 16, 2334−2344. (52) Harper, J. D.; Lieber, C. M.; Lansbury, P. T., Jr. Atomic force microscopic imaging of seeded fibril formation and fibril branching by the Alzheimer's disease amyloid-β protein. Chem. Biol. 1997, 4, 951− 959. (53) Chiricotto, M.; Melchionna, S.; Derreumaux, P.; Sterpone, F. Multiscale Aggregation of the Amyloid Aβ16‑22 Peptide: From Disordered Coagulation and Lateral Branching to Amorphous Prefibrils. J. Phys. Chem. Lett. 2019, 10, 1594−1599. (54) Souillac, P. O.; Uversky, V. N.; Millett, I. S.; Khurana, R.; Doniach, S.; Fink, A. L. Elucidation of the molecular mechanism during the early events in immunoglobulin light chain amyloid fibrillation: Evidence for an off-pathway oligomer at acidic pH. J. Biol. Chem. 2002, 277, 12666−12679. (55) Andersen, C. B.; Otzen, D.; Christiansen, G.; Rischel, C. Glucagon Amyloid-like Fibril Morphology Is Selected via Morphology-Dependent Growth Inhibition. Biochemistry 2007, 46, 7314− 7324. (56) Pedersen, J. S.; Andersen, C. B.; Otzen, D. E. Amyloid structureone but not the same: the many levels of fibrillar polymorphism. FEBS J. 2010, 277, 4591−4601. (57) Wu, C.; Shea, J.-E. Coarse-grained models for protein aggregation. Curr. Opin. Struct. Biol. 2011, 21, 209−220. (58) Legleiter, J.; Mitchell, E.; Lotz, G. P.; Sapp, E.; Ng, C.; DiFiglia, M.; Thompson, L. M.; Muchowski, P. J. Mutant Huntingtin Fragments Form Oligomers in a Polyglutamine Length-dependent Mannerin Vitroandin Vivo. J. Biol. Chem. 2010, 285, 14777−14790. (59) Arosio, P.; Knowles, T. P. J.; Linse, S. On the lag phase in amyloid fibril formation. Phys. Chem. Chem. Phys. 2015, 17, 7606− 7618. (60) Roskamp, K. W.; Montelongo, D. M.; Anorma, C. D.; Bandak, D. N.; Chua, J. A.; Malecha, K. T.; Martin, R. W. Multiple Aggregation Pathways in Human γS-Crystallin and Its AggregationProne G18V Variant. Invest. Ophthalmol. Visual Sci. 2017, 58, 2397− 2405. (61) Boatz, J. C.; Whitley, M. J.; Li, M.; Gronenborn, A. M.; van der Wel, P. C. A. Cataract-associated P23T γD-crystallin retains a nativelike fold in amorphous-looking aggregates formed at physiological pH. Nat. Commun. 2017, 8, 15137.

5462

DOI: 10.1021/acs.jpcb.9b03494 J. Phys. Chem. B 2019, 123, 5452−5462