MD Simulations of Viruslike Particles with Supra CG Solvation

Sep 6, 2017 - MD Simulations of Viruslike Particles with Supra CG Solvation Affordable to Desktop Computers. Matı́as R. ..... deviations (RMSD), and...
2 downloads 12 Views 4MB Size
Subscriber access provided by University of Colorado Boulder

Article

MD Simulations of Virus-Like Particles with Supra CG solvation affordable to desktop computers Matias R. Machado, Humberto C. González, and Sergio Pantano J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.7b00659 • Publication Date (Web): 06 Sep 2017 Downloaded from http://pubs.acs.org on September 6, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Theory and Computation is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

MD Simulations of Virus-Like Particles with Supra CG solvation affordable to desktop computers Matias R. Machado1, Humberto C. González1, Sergio Pantano1* 1

Biomolecular Simulations Group, Institut Pasteur de Montevideo, Montevideo, Uruguay. Mataojo

2020, CP 11400 * To whom correspondence should be addressed: Sergio Pantano, Tel/Fax:+598-2522 0910, e-mail: [email protected]

Running Title Water, WatFour, WatElse?

Keywords Multiscale simulations, SIRAH, GPU-acceletation

Abstract Viruses are tremendously efficient molecular devices that optimize the packing of genetic material using a minimalistic number of proteins to form a capsid or envelope that protects them from external threats, being also part of cell recognition, fusion and budding machineries. Progresses in experimental techniques have provided a large number of high-resolution structures of viruses and virus-like particles (VLP), while molecular dynamics simulations may furnish lively and complementary insights on the fundamental forces ruling viral assembly, stability and dynamics. However, the large size and complexity of these macromolecules poses significant computational challenges. Alternatively, CoarseGrained (CG) methods, which resign atomistic resolution privileging computational efficiency, can be used to characterize the dynamics of VLPs. Still, the massive amount of solvent present in empty capsids or envelopes suggests that hybrid schemes keeping a higher resolution on regions of interest (i.e. the viral proteins and their surroundings) and a progressively coarser description on the bulk may further improve efficiency. Here we introduce a mesoscale explicit water model to be used in double or triple-scale simulations in combination with popular atomistic parameters and the CG water used by the SIRAH force field. Simulations performed on VLPs of different sizes, along with a comprehensive analysis of the PDB, indicate that most of the VLPs so far reported are amenable to be handled on a GPU-accelerated desktop computer using this simulation scheme. 1-27 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 35

Graphical Abstract

2-27 ACS Paragon Plus Environment

Page 3 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Introduction Although viruses are primarily associated to disease in all forms of living organisms, the growing knowledge about their basic biology and structural features has allowed their manipulation as bionanomachines, which have provided astonishing solutions in medicine 1. These applications are partially possible thanks to the detailed understanding of the 3D structures of viruses and virus-like particles (VLP). After almost 40 years from the solution of the first atomic-resolution X-ray structure of a viral capsid, more than 200 structures can be retrieved from the Protein Data Bank (PDB 2, http://pdb101.rcsb.org/learn/posters-flyers-and-calendars/200-icosahedral-viruses). In parallel with that, molecular dynamics (MD) simulations have been used to study the conformational dynamics of viral capsids and envelopes 3–8. Perhaps the most remarkable contribution to this field came from the group of Klaus Schulten, who published the first simulation on a complete VLP in 2006, exploring the stability of full and empty capsids of the Satellite Tobacco Mosaic virus at a fully atomistic level 9. In 2013, the same group performed the biggest (at the time) MD simulation of a VLP, studying an allatoms model of the HIV-1 capsid, shedding light on the critical role of the presence of pentameric vs hexameric arrangements of individual proteins to shape the form of the whole capsid 10. Because of the huge computational cost associated with these molecular systems, coarse-grained (CG) approaches are becoming a frequent choice

7,11

. Using this kind of methods, Reddy et al. built up a

model of the A influenza viruses as a part of an integrative approach which highlighted the relevance of the lipid species to shape protein-protein interactions at the virion's membrane

12

. Similarly, using an

elegant combination of experimental methods and multiscale modeling, Ayton and Voth studied the structure of the immature HIV-1 virion. Backmapping from CG models to all-atoms representations provided accurate predictions on protein-protein interactions ruling the assembly of the Gag protein lattice

13

. A comprehensive review on MD simulations of different viral systems has recently been

published by Reddy and Sansom 8. Unfortunately, the huge size of VLPs makes their simulation prohibitive for research groups with limited or no access to supercomputer facilities. Hence, cost-effective simulation schemes are still needed to extend MD simulations to longer timescales and bigger systems. One possible approach is to implicitly represent solvent subsystems using continuum solvation models, such as the widely used Poisson-Boltzmann or Generalized Born integral equations

16

14,15

, as well as the 3D-RISM theory of solvation based on

, which has been shown to achieve astonishing speed ups for protein folding

simulations on standard desktop computers

17,18

. Another attractive strategy is the development of 3-27

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

hybrid resolution methods for explicit solvation

19–21

Page 4 of 35

. These simulation schemes combine the best of

two worlds by keeping a higher level of detail in regions of interest and lower resolution in less relevant parts of the molecular system or the solvent 22. Previously, we have shown that atomistic water can be combined with the CG solvent used in the SIRAH force field (South American Initiative for a Rapid and Accurate Hamiltonian) developed by us 23,24. This hybrid solvation scheme was shown to be transferable to the most commonly used atomistic, or fine-grained (FG), water models

25

and further

extended to include dual resolution (FG/CG) solutes in large protein-DNA complexes

26

. These

methods, along with the dazing increases in computational performance achieved by GPU-accelerated MD simulations hold a great potential to enrich our capacity to characterize complex macromolecular systems. Expanding this line of development, here we present a supra CG (SG) water model to be used in multiscale solvation approach applied to systems that requires a massive amount of explicit bulk solvent, such as empty viral capsids or envelopes. This SG model is capable of describing two kinds of schemes: i) Triple scale simulations of FG solute/solvent surrounded by a first shell of CG solvent and a second shell of SG water (FG/CG/SG); ii) CG solute/solvent surrounded by SG solvent (CG/SG). In all these cases we found a good reproduction of structural properties and, when applicable, the multi resolution scheme provides fully atomistic-compatible simulations. Finally, we estimated the performance of our simulation scheme in relation with the available VLP structures currently reported on the PDB. It turns out that the most cost-effective version of our solvation scheme (i.e., CG/SG) allows for the simulation of about 98% of them using only GPUaccelerated desktop computers.

4-27 ACS Paragon Plus Environment

Page 5 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Methods Outline of the SIRAH force field The SG water model presented here was developed to work in combination with the atomistic and the CG SIRAH force field. Although it has been already presented in the literature, we briefly outline it here for the sake of completeness. SIRAH is a physical-based CG force field 27, which uses a classical six-terms Hamiltonian common to most all-atoms potentials to describe the particle-particle interactions. Briefly, bonded interactions involving bonds, angles and improper torsions are described by harmonic terms (e.g.: kb * (r – rb)2, with reference force constants kb and equilibrium bond length rb), while Fourier expansions are used for dihedrals. Non-bonded contributions are accounted by Coulomb and 12-6 Lennard-Jones terms. The Lorentz-Berthelot combination rule is used for all atom-type pairs except otherwise stated, in which case specific Lennard-Jones parameters are set. The AMBER scaling factor is used for the 1-4 non-bonded interactions. SIRAH force field has parameters and topologies for describing DNA 28, proteins 29, lipids 30, water and ions 23. The FG-to-CG mapping scheme is based on physicochemical intuition and uses the position of real atoms to place CG beads. For the CG mapping of proteins, the peptide bonds are treated with a relatively low granularity, keeping the positions of the nitrogen, α carbon, and oxygen, while side chains are modeled at lower degree of detail (Fig. 1). This backbone representation allows for the unbiased conformational description without imposing any secondary or tertiary structure, leading to a straightforward physical interpretation of the conformational space explored by peptides/proteins. Moreover, the use of partial charges on backbone, polar and charged side chain beads and solvent permits accounting for short and long-range electrostatics effects. Additionally, the use of uniform mass values of 50 a.u. for all protein beads in combination with the derived bond-stretching parameters allows for time steps of 20 fs.

5-27 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 35

Figure 1. SIRAH proteins at a glimpse. CG beads are mapped on atomic positions named in the figure according to PDB 3.0.1 nomenclature. A one-letter code is used for residue names. Backbone mapping is only indicated for glycine. For shortness, residues sharing a similar CG topology are grouped together and the green letter indicates the residue shown. CG beads are drawn at scale according to their van der Waals radii and colored by charge from negative (red) to positive (blue) through neutral (white). Full details about topologies and interactions are described in ref 29.

Supra CG solvation model Water molecules feature a tetrahedral shape with vertices at Hydrogen atoms and electronic lone pairs. In the bulk, this results in a central water surrounded by four neighbors, each placing an oxygen at the vertices of a tetrahedron

23

. This transient structuration inspired the WatFour CG model (WT4, for

shortness) in which four interconnected beads replicate the structure of a water cluster (Fig. 2A). In short, WT4 can be simply considered as a water molecule with a higher granularity. This simple observation suggested the possibility to up scale the granularity to represent a SG solvent that can coexist with other levels of representation without perturbing the essential properties of the liquid at each scale. This corresponds to a mesoscale water model named WatElse (WLS) obtained as a “homothetic” up scaling of the WT4 model (Fig. 2A). WLS was made of four interconnected beads representing 5 WT4 molecules or 55 FG water molecules, respectively (see Fig. 2A for topology and parameters). As stated before, the SIRAH force field uses the same potential energy function as standard FG models. Hence, all interactions at any scale were always calculated within the same classical Hamiltonian. The analogy between water/WT4 and WT4/WLS was complete but for the lack of electrolytes in the SG representation. Being WLS bigger

6-27 ACS Paragon Plus Environment

Page 7 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

than the Debye length at physiological concentrations, ions surrounded by WLS are expected to already be screened within the first supra molecular solvation shell.

Figure 2. Triple scale solvation model. (A) Comparison between the three models for water at different resolution. From left to right: WLS, WT4 and TIP3P molecules. The size of the spheres corresponds to the actual van der Waals radii. The interaction parameters of WT4 and WLS are reported in the table. (B) Snapshot of a triple-resolution solvent system after 10 ns MD simulation. Water molecules are colored red (oxygen) and white (hydrogen). The two CG regions are represented by colored surfaces. The different regions (top) match to the density profile across the box (bottom). The dashed line corresponds to the sum of the three densities considering that WT4 and WLS beads correspond to 2.8 and 14 water molecules, respectively. (C) Coordination numbers of WLS beads around WT4 and water. (D) Number density distribution of the different species around water oxygen. The thick gray trace was calculated on a system containing pure TIP3P water 25 and is used here as a standard. The black dashed line corresponds to the mass weighted sum of all species (as in panel B).

The simplest implementation of the triple layer solvation is a box with 3 apposed slabs of TIP3P

31

,

WT4 and WLS (Fig. 2B). As already shown for dual resolution systems, the triple layer scheme resulted in a smooth interface in which the different water models experienced only limited mixing 25. This was evident from the density of the different components along the computational box (Fig. 2B). Although the density of each component varied across the box, summing all three contributions added up to the experimental value with deviations only at the interfaces owing to the interpenetration of

7-27 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 35

smaller granularity molecules in the interstices of bigger ones (Fig. 2B). Mixing times between WLS and the other species were in the order of 100 ps (Fig. 2C), comparable with reported values for Water/WT4 mixes

24

. This multiscale solvation scheme maintained the structural organization of the

solvent, which was nicely recovered when considering the three components (Fig. 2D).

Setting up VLP simulations Building up of computational boxes containing VLP poses some specific challenges, some of which have have been discussed by Reddy and Sansom 8. In connection with our approach, we recall that usual solvation algorithms leave small voids owing to imperfect packing of solvent around solutes. Normally, this heterogeneity is completely harmless and rapidly solved during the course of stabilization dynamics in the NPT ensemble. However, most often VLPs introduce a physical separation of the inner and outer compartments, which may not interchange molecular components. Hence, solvent voids in the inner core that cannot be filled result in a spurious lower density and the formation of “vacuum bubbles” inside the VLP. This problem may be more pronounced when setting up higher granularity solvent models. Aiming to overcome these setbacks, we developed the following protocol: 1) solvate the VLP using FG, FG/CG/SG or CG/SG schemes using PACKMOL

32

; 2)

perform an energy minimization of the system and run a 1 ns NVT equilibration MD with positional restrains of 1000 kJ mol-1 nm-2 on protein atoms, 3) keep the first solvation shell (at highest resolution) around the protein and resolvate the system using the positions generated in step 1; 4) repeat step 2; 5) run a 1 ns NPT unrestrained MD; 6) check for density issues (vacuum bubbles); 7) if density issues were found, then resolvate the system using pre-equilibrated solvation boxes of the corresponding molecule and repeat step 2. Steps 2-4 are meant to fix solvent issues at the protein interface, while steps 5-7 solve global density issues due to VLP size changes. In our experience, this process sufficed to produce homogeneous simulation boxes. However, iterations of steps 5-7 may be needed in somewhat “pathological” systems before running NPT production simulations.

Simulation details For the FG/CG/SG solvation scheme we used a spherical shell of atomistic water extending up to 1 nm away from the farthest atom at external and internal faces of the VLP, followed by a 2 nm shell of WT4 at either side of the FG solvent. The remaining volume was filled with WLS. FG and CG electrolytes were randomly placed substituting water or WT4 molecules, respectively, to yield the desired concentration. We used 150 mM of NaCl in each simulation. The only difference in the CG/SG 8-27 ACS Paragon Plus Environment

Page 9 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

solvation scheme was using a 2.5 nm shell of WT4 from the VLP. In all the cases we considered an 8 nm distance between the solute and the octahedral box. Three VLPs were considered in this study, namely, a capsid derived from the Sesbania Tobacco Mosaic virus (SeMV), the capsid of the Triatoma virus (TrV) and the envelope of the Zika virus (ZIKV). The atomic coordinates of SeMV, TrV and ZIKV were taken from PDB structures 1X36 3NAP

34

and 5IRE

35

33

,

respectively. Protonation states were set at pH 7 in all cases. In SeMV,

crystallographic calcium ions were kept. In ZIKV, missing side chains of the asymmetric unit were built with AMBERTOOLS15 36, and suitable rotamers set to residues W20 and R292 of E proteins to prevent steric clashes. After reconstruction, 5000 steps of minimization were performed using the AMBER force field

37

. The entire envelope was then generated by applying the corresponding

symmetry transformations (BIOMT). Only residues 1 to 405 comprising the ectodomain of envelope E proteins were used for the simulation of ZIKV. SeMV and TrV at FG level were described by the ff99SB force field of AMBER

38

as implemented in GROMACS through ffamber ports

39

. CG

representations of TrV and ZIKV were generated by mapping the atomic structures to the SIRAH protein model 29 using SIRAH Tools 40. All

simulations

were

(http://www.gromacs.org)

performed 41

with

the

GPU

implementation

of

GROMACS

4.6.7

. A reference temperature of 300 K was set by coupling solute and each

solvent separately to the V-rescale thermostat 42 with coupling times of 2 ps. The pressure was kept at 1 bar by means of a Parrinello-Rahman barostat 43,44 with a coupling time of 8 ps. A minimum cutoff for nonbonded interactions of 1.2 nm was set. Long-range electrostatics were evaluated using Particle Mesh Ewald 45,46 each 10 integration steps, the same time at which neighbor searching was performed. Newton's equations of motion were solved using a leap-frog integrator algorithm. A time step of 2 fs was used in FG and FG/CG/SG systems and all bonds involving hydrogen atoms were restrained using the LINCS algorithm

47,48

. On the other hand, in CG/SG systems a time step of 2 fs was used for

equilibration while 20 fs was set for production. Snapshots were recorded every 5 ps in FG and FG/CG/SG systems and every 100 ps in CG/SG systems for analysis.

Calculated Properties Solvent and ion radial distribution functions (RDF) were measured from VLP's center of geometry, while spatial distribution functions (SDF) were calculated after fitting to the experimental structure with the Volmap plugin of VMD 49 using a grid spacing of 0.05 nm. Radius of gyration (RGYR), root 9-27 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 35

mean square deviations (RMSD) and fluctuations (RMSF) were calculated on Cα atoms taking as reference the experimental structures of the entire capsid, monomer or protein domain, according to the case. Same sampling and time windows were considered when comparing RMSF of systems at different resolution. The B-factor of the i-th residue in each monomer was estimated as Bi = 8 * π2 * RMSFi2 / 3 and normalized by the mean and standard deviation along the monomer: Bi' = (Bi - ) / σ(B). Protein contacts were defined using a 0.8 nm cut-off distance between Cα atoms. Calcium sites at SeMV were assumed occupied if the ion was closer than 0.5 nm to any two residues forming the binding site. All molecular representations were generated by VMD 49.

Results and Discussion In the following paragraphs, we present a series of simulations on VLPs of different characteristics and using different solvation schemes. These examples were only meant as test cases to show the capabilities of the method. Providing biological insights on these systems goes beyond the scope of this manuscript.

i) Comparing FG vs. FG/CG/SG simulation For the sake of computational convenience, we first focused on the structure of the small icosahedral VLP corresponding to SeMV. This is a plant virus consisting of a 4.1kb positive, single stranded RNA encoding three open reading frames 50. The last open reading frame codes for the capsid protein, which is composed by 268 amino acids (aa) divided in a disordered (aa 1 to 71) and an ordered domain (aa 72 to 268). The first domain contains an arginine rich motif and coordinates RNA encapsidation

51

.

Deletion of the first 65 aa resulted in empty icosahedral VPLs with an external diameter varying between 17 and 22 nm as determined by transmission electron microscopy 52.

10-27 ACS Paragon Plus Environment

Page 11 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 3. Simulated VLP systems and global MD descriptors. Each panel shows from top to bottom: Molecular representations of the simulated systems at identical size scales. The VLP is drawn as a yellow surface while solvent particles are shown as spheres of corresponding sizes; Solvent density profiles along the simulation box from the center of the VLP in pure FG (yellow) or multiscale solvation (coloring TIP3P in red, WT4 in blue and WLS in gray). The experimental RGYR of the VLP is indicated as a vertical dashed line; Time evolution of RGYR and RMSD.

Similarly, a deletion of the first 31 aa produced crystals diffracting at a resolution of 0.27 nm containing 60 copies of the capsid protein in an icosahedral arrangement

33

. This VLP constituted a

stringent test for our solvation method as the stability of the capsid has been shown to be strongly dependent of the solvation structure

33

. Additionally, 60 calcium ions were present in the structure

coordinated between pairs of monomers. Comparative simulations were set up as indicated in Fig. 3A. It is worth mentioning that a decision on the amount of solvent substitution and its nature (CG, SG, or both) has to be made weighting the speed up required and the size of the VLP of interest. Owing to the relatively reduced dimensions of SeMV, we decided to implement a triple scale solvation including WLS only in the exterior of the simulation box (Fig. 3A). Although an innermost shell of SG could be included, it would contain only a handful of 11-27 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 35

WLS molecules. This would result in minimal speed up, but requiring additional solvation steps as indicated in the Methods section. The solvent distribution in both solvation schemes stabilized within the first 10 ns. After this period, the density profile of the solvent on the radial coordinate remained practically invariable. This profile for the FG/CG/SG showed an alternation of regions of low and high density according to the solvent species (Fig. 3A) with the CG solvent starting at 1 g/cm3 (i.e., the experimental value) to decrease after near 5 nm, concomitant with an increase of FG solvent. As observed previously,

24,25

water and WT4

experience limited mixing presenting a smooth interface. The density of water decreased because of the presence of the protein (indicated by a dashed line). After that point the FG water density reached a new maximum in the external solvation shell, after which it faded away being substituted by CG and SG solvent. Only after about 12 nm the density of the SG solvent, WLS, reached appreciable values, where it showed a smooth mixing interface with the CG solvent. The average radial density of WLS remained slightly below 1 g/cm3 because of the use of truncated octahedron geometry for periodic boundary conditions and because of the presence of a few WT4 molecules in the SG face. The comparable performance of the FG and multiscale solvation was underlined by the coincidence of the density profiles of FG water molecules. In particular, is could be appreciated that the FG water density did not reach zero at the protein region in none of the systems. This was in agreement with the presence of crystallographic water at the protein-protein interface and within deep protein crevices

33

. Beyond

this good global agreement, we decided to explore at a higher level of detail the comparability of water distribution in both solvation schemes. To this aim, we performed an RMSD fitting of both trajectories onto the X-ray structure and calculated the average occupation of FG water in both systems over the last 10 ns of each simulation (Fig. 4). Although we could not expect a perfect fit of occupational clouds from the two independent trajectories, we did find a good correspondence in particular regions surrounding the protein. One of these was the 5-fold symmetry axis. The pentameric star-like water distribution at the convergence of the five protein replicas is evident (Fig. 4B). Additionally, both solvation schemes matched with good precision the position of several crystallographic buried waters (Fig. 4C). It may be relevant to recall that both systems were set up by removing the crystallographic water, entirely resolvating the systems and randomly positioning water and ions (see Methods). Thus, these observations suggest that the initial water/electrolytes configuration is negligible in both solvation schemes. This particular structure of SeMV was chosen, among others, because of the presence of 60 calcium 12-27 ACS Paragon Plus Environment

Page 13 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

ions coordinated between two neighbor capsid proteins (Fig. 4A, D). These ions were maintained in the set up as we wanted to check for possible artifacts of the multiscale solvation in the coordination of divalent cations. As evident from Fig. 4D, calcium coordination is equally maintained in both simulations for a variety of functional groups, i.e., the acidic side chains of D146, D149, the carbonyl oxygen of Y207, the carboxamide oxygen of N267 and the C-terminal moiety of N268. However, it must be mentioned that not all the calcium ions remain bound to the protein along the simulations. As shown in Fig. 4E, the occupation of calcium sites decreased slightly along the simulations (Fig. 4F), indicating that some ions left their binding site. After 150 ns, nearly 10% and 4% of the calcium have left their binding sites in the FG and FG/CG/SG, respectively. It is unclear if this effect took place because of the differences in crystallographic vs. solution conditions, small defects in the force field parameterizations, or it reflected just a normal exchange promoted by solvent competition. Whatever the case, FG and FG/CG/SG solvation schemes produced a similar behavior. In light of the good agreement in describing the solvation structure, we turned our attention to the comparison of the protein conformation. As a global descriptor of the VLP, we first evaluated the radius of gyration (RGYR). As shown in Fig. 3A, the RGYR in both simulations experienced a comparably small increment of about 0.4 nm, as compared to the initial radius of 7.5 nm. This change was well within the expected variation for this VLP in solution, considering that much larger variations have been reported for SeMV from cryo-electron microscopy (Cryo-EM) images 52. The kinetics of the stabilization, however, were quite faster in the multiscale case, likely due to the intrinsically faster configurational sampling achieved by CG solvent. While the FG simulation takes about 100 ns to reach a plateau, the multiscale system stabilized already after the stabilization phase (Fig. 3A). Identical conclusions could be drawn from different and more stringent structural descriptors. For example, the RMSD in both simulations converged to the same values within statistical precision at the level of the entire capsid or single monomers (Fig. 3A). Likewise, calculation of the number of interprotein contacts converged to the same values at the end of the simulations (Fig. 4F). The small decrease in the number of contacts observed along the time was consistent with the rise of RGYR (Fig. 3A).

13-27 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 35

Figure 4. FG vs. FG/CG/SG simulations of SeMV. (A) Surface representation of the VLP showing a central monomer (purple ribbons) and its contacts at 3-fold (blue ribbons) and 5-fold symmetry axis. Calcium ions are shown as yellow spheres. View points of panels B-D are indicated. (B,C) Representative water occupancies at 5-fold symmetry axis and inside the monomer fold. SDFs are drown using isosurface values of 0.4 and colored according to FG (green) and FG/CG/SG (red). Crystallographic waters are represented as blue spheres. (D) Coordination of calcium ions at the binding site. Comparison among experimental structure (gray) and representative snapshots from FG (green) and FG/CG/SG (red). (E) Number of occupied Calcium binding sites. (F) Average amount of inter-monomer contacts relative to the experimental structure. (G) Average RMSF values for both solvation schemes and individual traces for FG/CG/SG (orange). The experimental secondary structure assignment is also indicated. Analysis of panels B-D and G were done at the last 10 ns of simulation.

Finally, we sought to investigate the local flexibility of the individual amino acids along the polypeptides to rule out a spurious over-stabilization of the protein owing to the presence of the bulkier and heavier CG molecules. In contrast with this speculation, the RMSF calculated along both trajectories resulted in nearly identical profiles with a general correspondence between the minima and elements of secondary structure (Fig. 4G). Interestingly the only exception to this pattern was the helical segment between residues 210 and 230, which constituted an exposed helix. Taken altogether, the comparison between the FG and FG/CG/SG solvation schemes provided quantitatively comparable results. An important difference, however, regarded the shorter stabilization times achieved by the FG/CG/SG approach in all the descriptors analyzed. This suggested that the faster configurational sampling featured by CG models granted an additional computational advantage

14-27 ACS Paragon Plus Environment

Page 15 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

besides the significant reduction in the number of particles.

ii) Comparing FG/CG/SG vs. CG/SG simulation In sight of the encouraging results obtained in the previous paragraph, we sought to simulate a more complex and bigger system. As a test case we used the VLP of TrV

34

, which doubles the RGYR of

SeMV (Fig. 3B), requiring a sensible increase in the number of particles. Alike SeMV, this structure is constituted by the empty and naked capsid of the virus. However, TrV is composed by 60 copies of a trimer of viral proteins (VP1, VP2 and VP3) entangled in a pentamer, which is replicated 12 times to form the complete capsid (Fig. 5A). TrV belongs to the Dicistroviridae family and infects bloodsucking insects (usually denominated “kissing bugs”). In particular, it is pathogenic to Triatoma infestans, the vector of Chagas disease, which constitutes a severe health issue in most Latin American countries 53. This virus has been proposed as an alternative for biological control on insect populations that develop resistance to pesticides

54

. Furthermore, its pH reversible destabilization suggested also

55

biotechnological applications , which could be fine-tuned by molecular simulations. Having assessed the capabilities of the FG/CG/SG scheme in the previous paragraph, we took profit of this larger system to establish a parallelism with CG/SG. As shown in Fig. 3B, the solvent density distribution of the FG/CG/SG simulation presented a similar profile to SeMV (Fig. 3A). However, in this case the water density reached zero in correspondence with the localization of the protein, which was consistent with the tightly shielded protein-protein interfaces of TrV. Also in this case the gyration radius of the VLP stabilized at a slightly higher value (about 5%) in relation to the crystallographic structure, after nearly 100 ns.

15-27 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 35

Figure 5. FG/CG/SG vs CG/SG simulations of TrV. (A) Molecular representation of the solvent accessible surface of TrV. Colored segments correspond to proteins VP1-3 composing the capsid. The black shaded area delimits the pentamer of trimers. (B) Average intermolecular contacts of VP1, VP2 or VP3 with their respective neighboring chains. (C) Residue level RMSF from 100-150 ns of simulation. Thick and thin traces represent average and individual values, respectively for atomistic (black and grey) and CG (red and orange) proteins.

A similar trend was followed by the RMSD of the entire VLP, although the time evolution of the polypeptide chains considered separately reached lower values. This suggested that, not surprisingly, the intrinsic dynamics of the capsid as a whole is higher than that of the individual components. 16-27 ACS Paragon Plus Environment

Page 17 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Comparison with the CG/SG simulation indicated that both simulations converged to practically the same RGYR and within a comparable timescale (Fig. 3B). Differences arose, however, when comparing the RMSD values. Although we found a qualitative agreement regarding the RMSD of the entire VLP vs. individual components, the CG simulation resulted in higher deviations from the initial position and slower convergence. This discrepancy is likely associated to the intrinsic loss of accuracy of the CG representation, and is in line with the RMSD values found for other proteins

29

. It is also

important to keep in mind, however, that the deviations displayed by the CG/SG simulation are in qualitative agreement with the large plasticity shown by TrV VLPs from Cryo-EM images 56. Aimed to further explore these discrepancies, we sought to evaluate other structural descriptors such as the averaged number of intermolecular contacts for each protein of the capsid (Fig. 5B). Regardless of the identity of the protein (VP1-3) and the simulation scheme, we obtained conservation values around 90%, indicating a good retention of the compactness of the viral coat in both simulations. Curiously, VP2 maintained a slightly higher percentage in the CG simulation, perhaps related to an augmented stabilization resulting from the higher entanglement of this chain with its neighbors. A comparative analysis of the local fluctuations was addressed by superimposing the RMSF at the level of each single aa (Fig. 5C). Overall, both simulations presented an excellent quasi-quantitative agreement. All together, we observed that both simulation schemes provide comparable structural and dynamic views of the TrV's VLP.

iii) CG/SG for larger VLPs: The case of Zika virus ZIKV belongs to the family Flaviviridae and was first isolated in 1947 in the Zika forest, Uganda. It is transmitted to different species by mosquitoes of the genus Aedes 57 and is raising significant concerns for public health since accumulating evidence suggests a direct link between ZIKV infection, GuillianBarre' syndrome and teratogenic effects rapidly spread in the Americas

60

58,59

. Since its introduction to Brazil in 2013-2014, ZIKV has

and an increasing number of cases is being reported in the U.S.

(http://www.cdc.gov/zika/geo/index.html). Currently, there are no approved therapeutic tools available for this infection. Alike other flaviviruses

61

, ZIKV features a single positive sense RNA genome,

which is translated as a single polypeptide chain post-translationally cleaved into 10 different proteins, among which the ~500 aa long envelope protein (E) is the only exposed to the exterior in the mature virion, making it a major target for immunological studies. The ZIKV's exterior shell is constituted by 180 copies of the protein E, which is subdivided in three domains (DI, DII and DIII) according to 17-27 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 35

structural and functional roles. E proteins associate as primarily dimers, then and as trimers of dimers (called rafts) to finally form the icosahedral envelope (Fig. 6A). The recently reported Cryo-EM structure of the ZIKV virion envelope showed a high similarity to other flavivirus 35. Indeed, ZIKV's E protein shares a ~50% identity with other flaviviruses as Dengue 62

. This high identity may give rise to the so-called Antibody Dependent Effect, by which antibodies

developed by subjects previously infected with a different virus partially recognize a flavivirus envelope upon a subsequent infection. Since the recognition is not perfect, antibodies fail to neutralize the virion enhancing the infection by contributing to its cellular internalization by different routes

63

.

Although X-ray crystallography has achieved important goals in helping raise highly specific antibodies for therapeutic applications flaviviruses are highly dynamic

66,67

64,65

, Cryo-EM data showed that particles of different

. In this context, detailed dynamic characterization of VLP

dynamics would become a great asset to pinpoint unique epitopes. As an example of application, and owing to the large size of the virus, we undertook a CG/SG simulation of the ectodomain of ZIKV (Fig. 3C). In contrast with the previous test cases, the ZIKV envelope is not a tightly shielded protein crust. As a result, water (even at the CG level) can partially overlap with the same radial coordinate of the protein envelope. Because of this the radial distribution function of WT4 displayed a non-zero minimum in correspondence with the protein shell (Fig. 3C). The CG/SG MD simulation of the virion shows a stable behavior with the RGYR experiencing small variations (~ 2% from the initial value) and converging within the first 100 ns (Fig. 3C). Evaluation of the RMSD revealed different kinetics, reaching a plateau only after 800 ns. This behavior, however, was not homogeneous among the different structural levels as the RMSD measured on the VLP as a whole, on each monomer, and on each protein domain were well-separated and reaching convergences at different times (Fig. 3C). Similar observations were made for the previous examples but the larger size of the VLP seemed to exert a magnifying effect (compare Fig. 3A, B and C). Despite this, we obtained a good reproduction of secondary structure elements, which were conserved up to over 90% of the initial value.

18-27 ACS Paragon Plus Environment

Page 19 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 6. CG/SG simulation of ZIKV. (A) On the left, ectodomain of E protein dimer with one monomer colored by functional domains and the other in gray. On the right, molecular representation of the VLP showing the E proteins' distribution along the icosahedral symmetry axes. Black dots point the geometric center of DII domains in the dimers from which the raft angle (black line) is measured (see panel C). (B) Average amount of inter and intra dimer contacts relative to the experimental structure. (C) Distribution of conformations along the simulation for the 30 E protein rafts in the VLP. Angle defined in panel A. Isocontour regions are colored according to the number of rafts having a given angle value (scale on the right). (D) Comparison between experimental and average normalized B-factors from the last 50 ns of the simulation. The experimental secondary structure assignment and the functional domains are also indicated.

In agreement with the small reduction of the RGYR, we observed a rise in the number of proteinprotein contacts, which showed slightly different values if computed intra-dimer or inter-dimer wise 19-27 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 35

(Fig. 6B). Moreover, visual inspection of the trajectory highlighted motions of the envelope rafts, which could be characterized as the angle defined by the geometric centers of three contiguous dimers. This quantity showed a high flexibility with sensible angular variations deviating from the regular icosahedral form (Fig. 6C). The time dependence of all the possible raft angles indicated a continuous sampling of values from nearly 120 to 160 degrees. The population of the angular values changed along the dynamics showing the existence of shallow peaks. We noticed that a certain stabilization was reached in rough coincidence with the plateau of the RMDS curve (Fig. 3C). However, this coincidence is not complete and we observed that the landscape continued to vary until the end of the simulation. All these descriptors suggested a rather complex dynamics of the ectodomain, which has also been reported in other CG simulations of Dengue 68. Moreover, the intrinsic mobility of the envelope rafts is in accordance with a study on normal mode analysis recently reported by Hsieh et al. 69, which revealed a significant anticorrelated motion between peripheral proteins in the raft. Furthermore, comparison of calculated B-factor with experimental data showed a very good global agreement between high flexibility peaks and regions without secondary structure (Fig. 6D). Moreover, the coincidence of high flexibility regions could also be verified locally in biologically relevant regions as the fusion and variable glycosylation loops (FL and VGL, respectively in Fig. 6D). In summary, the simulation of ZIKV VLP at the CG/SG level not only resulted qualitatively consistent with previously reported CG simulations on closely related flaviviruses and completely unrelated theoretical approaches as normal mode analysis, but also with experimental data, indicating the suitability of the method for the description of large VLPs.

iv) Accuracy, speed-up and affordability The approach introduced here may grant a significant speed-up in molecular systems with vast amounts of bulk water as in the case of VLP. Clearly, the higher the amount of FG water substituted by CG or SG, the higher the performance (Fig. 3A). However, the suitability of each approach FG, FG/CG or FG/CG/SG depends on the system and the properties under study. Obviously, choices have to be made in order to obtain a good balance between accuracy and speed up. In the first case (SeMV), aimed to address fine details of water-mediated interactions in a relatively small VLP, we applied a FG/CG/SG scheme with the SG region only on the external shell (Fig. 3A). Because of the reduced dimensions of the VLP, including WLS in the inner shell would have only implied a reduction in a handful of beads without an evident impact on performance. In this case, using the FG/CG/SG approach granted a 3-fold 20-27 ACS Paragon Plus Environment

Page 21 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

speed up (Fig. 7A) with no significant loss in accuracy (Figs. 3, 4). A VLP with twice the radius, as that of the TrV, implies (under a spherical approximation) an 8 times higher volume, justifying the choice of FG/CG/SG on inner and outer shells. It is worth noticing that according to the statistics of icosahedral structures reported on the PDB, VLPs with radii similar to TrV are the most abundant (Fig. 7B). This suggests that FG/CG/SG would be the scheme of choice for most applications requiring fully atomistic detail in the solute and its surrounding aqueous environment. However, for exploring larger simulation times or bigger systems at amino acid resolution, CG/SG provided a solid alternative with nearly 40-fold speed up against FG/CG/SG simulations.

Figure 7. Computational cost of VLP simulations at different solvation schemes. (A) Estimations of solvent particles needed to solvate 1 nm thick spherical VLPs according to settings detailed on the section Methods. Dots point the actual simulated systems. Performances were obtained on an Intel Core i7 3.7 GHz CPU with a GPU Tesla K40c. Stars denote the VLP size limit for running on a single GPU. The GPU memory limit was taken from benchmarks at http://ambermd.org/gpus on explicit solvent simulations using a Tesla K40c graphic card. The size of fully atomistic HIV10 1 system simulated at Blue Waters supercomputer is indicated as a reference. (B) Size distribution of available icosahedral viruses at PDB (http://pdb101.rcsb.org/learn/posters-flyers-and-calendars/200-icosahedral-viruses) and affordable simulation range on one GPU according to memory limitations and system size predictions from panel A. Bigger VLPs as Faustovirus (PDB id: 5J7V) were considered in the analysis but scape the graph scale.

It is worth noticing that these speed-up values have been estimated on simulations performed in an Intel Core i7 3.7 GHz CPU equipped with a GPU Tesla K40c running on standard GROMACS 4.6.7. In this regard, we anticipate that the introduction of multiple timestep algorithms in popularly used MD engines allowing for differential update of coordinates in different phases of the system (i.e., FG, CG 21-27 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

and/or SG)

70,71

Page 22 of 35

will grant a further acceleration in multiple resolution approaches in the future.

The possibility to perform highly demanding MD simulations on desktop computers is particularly relevant because of the scarce accessibility to supercomputer facilities world-wide and, specially, in regions frequently affected by viral bursts like Asia, Africa and South America. Taking into account that the GPU memory limits the maximum number of atoms in the computational box and the size distribution of the VLP reported on the PDB, we conclude that fully atomistic simulations are doable, with desktops computers, for only 20% of the viruses (Fig. 7B). However, application of the FG/CG/SG scheme allows covering up to 80% of the VLP and this figure can be further extended up to 98% implementing the CG/SG approach.

Conclusions We presented a simple SG solvation model named WatElse (WLS for shortness) for multiscale molecular dynamics simulations. The model is part of the SIRAH force field, which is fully ported to GROMACS. Interaction parameters are freely available from the web site http://www.sirahff.com. Being the interactions completely Hamiltonian and using exactly the same functional form than most popular atomistic force fields, the porting of the force field to other MD engines is straightforward. The method has been validated here by comparisons between FG and FG/CG/SG simulations, in case of SeMV, and between FG/CG/SG and CG/SG for TrV. In each example we achieved a good reproduction of structural descriptors obtained from simulations and experimental data. Finally, we showed what, to the best of our knowledge, is the first simulation of a ZIKV's VLP and one of the very few on flaviviruses

68,72

. Simple analysis of the global dynamics consents to capture interactions

associated to the quaternary structure that would be missed in studies of single or dimeric proteins and agree with published data, highlighting the good performance of the method. It seems relevant at this point to underline the fact that the ZIKV system simulated here would contain ~20 millions atoms at full detail representation. This becomes untreatable for desktop computers in terms of memory requirements for running, data processing, storage and the very nature of the file formats supported by common simulation and visualization software. Hence, we hope these methods may constitute an affordable alternative for computational biologists in developing countries, where a next-door collaboration with experimentalists and clinicians could boost the characterization of novel isolates anticipating structural changes requiring therapeutic countermeasures.

22-27 ACS Paragon Plus Environment

Page 23 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Acknowledgments This work partially funded by FOCEM (MERCOSUR Structural Convergence Fund), COF 03/11. M.R.M. and S.P. belong to the SNI program of ANII. Some of the Tesla K40c used for this research were donated by the NVIDIA Corporation. We are grateful to Prof. Leandro Martinez for his help on exploiting the capabilities of PACKMOL 32. Simulations on the Zika Virus were partially conducted at Santos Dumont under the grant VICBF1.

References (1) (2) (3) (4) (5) (6)

(7)

(8) (9)

(10)

(11)

(12)

(13) (14) (15)

Yildiz, I.; Shukla, S.; Steinmetz, N. F. Applications of Viral Nanoparticles in Medicine. Curr. Opin. Biotechnol. 2011, 22 (6), 901–908. Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. The Protein Data Bank. Nucleic Acids Res. 2000, 28 (1), 235–242. May, E. R.; Feng, J.; Brooks, C. L. Exploring the Symmetry and Mechanism of Virus Capsid Maturation Via an Ensemble of Pathways. Biophys. J. 2012, 102 (3), 606–612. Hagan, M. F. Modeling Viral Capsid Assembly. Adv. Chem. Phys. 2014, 155, 1–68. May, E. R. Recent Developments in Molecular Simulation Approaches to Study Spherical Virus Capsids. Mol. Simul. 2014, 40 (10–11), 878–888. Abi Mansour, A.; Sereda, Y. V.; Yang, J.; Ortoleva, P. J. Prospective on Multiscale Simulation of Virus-like Particles: Application to Computer-Aided Vaccine Design. Vaccine 2015, 33 (44), 5890–5896. Perilla, J. R.; Goh, B. C.; Cassidy, C. K.; Liu, B.; Bernardi, R. C.; Rudack, T.; Yu, H.; Wu, Z.; Schulten, K. Molecular Dynamics Simulations of Large Macromolecular Complexes. Curr. Opin. Struct. Biol. 2015, 31, 64–74. Reddy, T.; Sansom, M. S. P. Computational Virology: From the inside Out. Biochim. Biophys. Acta 2016, 1858 (7 Pt B), 1610–1618. Freddolino, P. L.; Arkhipov, A. S.; Larson, S. B.; McPherson, A.; Schulten, K. Molecular Dynamics Simulations of the Complete Satellite Tobacco Mosaic Virus. Structure 2006, 14 (3), 437–449. Zhao, G.; Perilla, J. R.; Yufenyuy, E. L.; Meng, X.; Chen, B.; Ning, J.; Ahn, J.; Gronenborn, A. M.; Schulten, K.; Aiken, C.; Zhang, P. Mature HIV-1 Capsid Structure by Cryo-Electron Microscopy and All-Atom Molecular Dynamics. Nature 2013, 497 (7451), 643–646. Ingólfsson, H. I.; Lopez, C. A.; Uusitalo, J. J.; de Jong, D. H.; Gopal, S. M.; Periole, X.; Marrink, S. J. The Power of Coarse Graining in Biomolecular Simulations. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2014, 4 (3), 225–248. Reddy, T.; Shorthouse, D.; Parton, D. L.; Jefferys, E.; Fowler, P. W.; Chavent, M.; Baaden, M.; Sansom, M. S. P. Nothing to Sneeze at: A Dynamic and Integrative Computational Model of an Influenza A Virion. Struct. Lond. Engl. 1993 2015, 23 (3), 584–597. Ayton, G. S.; Voth, G. A. Multiscale Computer Simulation of the Immature HIV-1 Virion. Biophys. J. 2010, 99 (9), 2757–2765. Antosiewicz, J. M.; Shugar, D. Poisson–Boltzmann Continuum-Solvation Models: Applications to PH-Dependent Properties of Biomolecules. Mol. Biosyst. 2011, 7 (11), 2923–2949. Onufriev, A. Continuum Electrostatics Solvent Modeling with the Generalized Born Model. In Modeling Solvent Environments; Feig, M., Ed.; Wiley-VCH Verlag GmbH & Co. KGaA, 2010; 23-27 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(16) (17)

(18)

(19) (20)

(21)

(22) (23) (24)

(25) (26) (27) (28)

(29)

(30) (31) (32)

(33)

(34)

Page 24 of 35

pp 127–165. Kovalenko, A. Three-Dimensional RISM Theory for Molecular Liquids and Solid—Liquid Interfaces. ChemInform 2005, 36 (21), no-no. Omelyan, I.; Kovalenko, A. Multiple Time Step Molecular Dynamics in the Optimized Isokinetic Ensemble Steered with the Molecular Theory of Solvation: Accelerating with Advanced Extrapolation of Effective Solvation Forces. J. Chem. Phys. 2013, 139 (24), 244106. Omelyan, I.; Kovalenko, A. MTS-MD of Biomolecules Steered with 3D-RISM-KH Mean Solvation Forces Accelerated with Generalized Solvation Force Extrapolation. J. Chem. Theory Comput. 2015, 11 (4), 1875–1895. Riniker, S.; van Gunsteren, W. F. Mixing Coarse-Grained and Fine-Grained Water in Molecular Dynamics Simulations of a Single System. J. Chem. Phys. 2012, 137 (4), 044120. Fogarty, A. C.; Potestio, R.; Kremer, K. Adaptive Resolution Simulation of a Biomolecule and Its Hydration Shell: Structural and Dynamical Properties. J. Chem. Phys. 2015, 142 (19), 195101. Zavadlav, J.; Marrink, S. J.; Praprotnik, M. Adaptive Resolution Simulation of Supramolecular Water: The Concurrent Making, Breaking, and Remaking of Water Bundles. J. Chem. Theory Comput. 2016, 12 (8), 4138–4145. Darré, L.; Machado, M. R.; Pantano, S. Coarse-Grained Models of Water. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2012, 2 (6), 921–930. Darré, L.; Machado, M. R.; Dans, P. D.; Herrera, F. E.; Pantano, S. Another Coarse Grain Model for Aqueous Solvation: WAT FOUR? J. Chem. Theory Comput. 2010, 6 (12), 3793–3807. Darré, L.; Tek, A.; Baaden, M.; Pantano, S. Mixing Atomistic and Coarse Grain Solvation Models for MD Simulations: Let WT4 Handle the Bulk. J. Chem. Theory Comput. 2012, 8 (10), 3880–3894. Gonzalez, H. C.; Darré, L.; Pantano, S. Transferable Mixing of Atomistic and Coarse-Grained Water Models. J. Phys. Chem. B 2013, 117 (46), 14438–14448. Machado, M. R.; Pantano, S. Exploring LacI–DNA Dynamics by Multiscale Simulations Using the SIRAH Force Field. J. Chem. Theory Comput. 2015, 11 (10), 5012–5023. Kmiecik, S.; Gront, D.; Kolinski, M.; Wieteska, L.; Dawid, A. E.; Kolinski, A. Coarse-Grained Protein Models and Their Applications. Chem. Rev. 2016, 116 (14), 7898–7936. Dans, P. D.; Zeida, A.; Machado, M. R.; Pantano, S. A Coarse Grained Model for AtomicDetailed DNA Simulations with Explicit Electrostatics. J. Chem. Theory Comput. 2010, 6 (5), 1711–1725. Darré, L.; Machado, M. R.; Brandner, A. F.; González, H. C.; Ferreira, S.; Pantano, S. SIRAH: A Structurally Unbiased Coarse-Grained Force Field for Proteins with Aqueous Solvation and Long-Range Electrostatics. J. Chem. Theory Comput. 2015, 11 (2), 723–739. Barrera, E. E.; Frigini, E. N.; Porasso, R. D.; Pantano, S. Modeling DMPC Lipid Membranes with SIRAH Force-Field. J. Mol. Model. 2017, 23 (9), 259. Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. Comparison of Simple Potential Functions for Simulating Liquid Water. J. Chem. Phys. 1983, 79 (2), 926–935. Martínez, L.; Andrade, R.; Birgin, E. G.; Martínez, J. M. PACKMOL: A Package for Building Initial Configurations for Molecular Dynamics Simulations. J. Comput. Chem. 2009, 30 (13), 2157–2164. Sangita, V.; Satheshkumar, P. S.; Savithri, H. S.; Murthy, M. R. N. Structure of a Mutant T=1 Capsid of Sesbania Mosaic Virus: Role of Water Molecules in Capsid Architecture and Integrity. Acta Crystallogr. D Biol. Crystallogr. 2005, 61 (Pt 10), 1406–1412. Squires, G.; Pous, J.; Agirre, J.; Rozas-Dennis, G. S.; Costabel, M. D.; Marti, G. A.; Navaza, J.; 24-27 ACS Paragon Plus Environment

Page 25 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(35) (36) (37)

(38)

(39) (40) (41)

(42) (43) (44) (45) (46) (47) (48) (49) (50)

(51)

(52)

(53) (54)

Journal of Chemical Theory and Computation

Bressanelli, S.; Guérin, D. M. A.; Rey, F. A. Structure of the Triatoma Virus Capsid. Acta Crystallogr. D Biol. Crystallogr. 2013, 69 (Pt 6), 1026–1037. Sirohi, D.; Chen, Z.; Sun, L.; Klose, T.; Pierson, T. C.; Rossmann, M. G.; Kuhn, R. J. The 3.8 Å Resolution Cryo-EM Structure of Zika Virus. Science 2016, 352 (6284), 467–470. Salomon-Ferrer, R.; Case, D. A.; Walker, R. C. An Overview of the Amber Biomolecular Simulation Package. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2013, 3 (2), 198–210. Maier, J. A.; Martinez, C.; Kasavajhala, K.; Wickstrom, L.; Hauser, K. E.; Simmerling, C. Ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from Ff99SB. J. Chem. Theory Comput. 2015, 11 (8), 3696–3713. Hornak, V.; Abel, R.; Okur, A.; Strockbine, B.; Roitberg, A.; Simmerling, C. Comparison of Multiple Amber Force Fields and Development of Improved Protein Backbone Parameters. Proteins 2006, 65 (3), 712–725. Sorin, E. J.; Pande, V. S. Exploring the Helix-Coil Transition via All-Atom Equilibrium Ensemble Simulations. Biophys. J. 2005, 88 (4), 2472–2493. Machado, M. R.; Pantano, S. SIRAH Tools: Mapping, Backmapping and Visualization of Coarse-Grained Models. Bioinformatics 2016, btw020. Páll, S.; Abraham, M. J.; Kutzner, C.; Hess, B.; Lindahl, E. Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS. In Solving Software Challenges for Exascale; Lecture Notes in Computer Science; Springer, Cham, 2014; pp 3–27. Bussi, G.; Donadio, D.; Parrinello, M. Canonical Sampling through Velocity Rescaling. J. Chem. Phys. 2007, 126 (1), 014101. Nosé, S.; Klein, M. L. Constant Pressure Molecular Dynamics for Molecular Systems. Mol. Phys. 1983, 50 (5), 1055–1076. Parrinello, M.; Rahman, A. Polymorphic Transitions in Single Crystals: A New Molecular Dynamics Method. J. Appl. Phys. 1981, 52 (12), 7182–7190. Darden, T.; York, D.; Pedersen, L. Particle Mesh Ewald: An N⋅log(N) Method for Ewald Sums in Large Systems. J. Chem. Phys. 1993, 98 (12), 10089–10092. Essmann, U.; Perera, L.; Berkowitz, M. L.; Darden, T.; Lee, H.; Pedersen, L. G. A Smooth Particle Mesh Ewald Method. J. Chem. Phys. 1995, 103 (19), 8577–8593. Hess, B. P-LINCS:  A Parallel Linear Constraint Solver for Molecular Simulation. J. Chem. Theory Comput. 2008, 4 (1), 116–122. Hess, B.; Bekker, H.; Berendsen, H. J. C.; Fraaije, J. G. E. M. LINCS: A Linear Constraint Solver for Molecular Simulations. J. Comput. Chem. 1997, 18 (12), 1463–1472. Humphrey, W.; Dalke, A.; Schulten, K. VMD: Visual Molecular Dynamics. J. Mol. Graph. 1996, 14 (1), 33–38. Lokesh, G. L.; Gopinath, K.; Satheshkumar, P. S.; Savithri, H. S. Complete Nucleotide Sequence of Sesbania Mosaic Virus: A New Virus Species of the Genus Sobemovirus. Arch. Virol. 2001, 146 (2), 209–223. Satheshkumar, P. S.; Lokesh, G. L.; Murthy, M. R. N.; Savithri, H. S. The Role of Arginine-Rich Motif and Beta-Annulus in the Assembly and Stability of Sesbania Mosaic Virus Capsids. J. Mol. Biol. 2005, 353 (2), 447–458. Gulati, A.; Murthy, A.; Abraham, A.; Mohan, K.; Natraj, U.; Savithri, H. S.; Murthy, M. R. N. Structural Studies on Chimeric Sesbania Mosaic Virus Coat Protein: Revisiting SeMV Assembly. Virology 2016, 489, 34–43. Rassi, A.; Rassi, A.; Marin-Neto, J. A. Chagas Disease. Lancet Lond. Engl. 2010, 375 (9723), 1388–1402. Lardeux, F.; Depickère, S.; Duchon, S.; Chavez, T. Insecticide Resistance of Triatoma Infestans 25-27 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(55)

(56)

(57) (58)

(59) (60) (61) (62)

(63)

(64)

(65)

(66) (67)

(68)

(69) (70)

Page 26 of 35

(Hemiptera, Reduviidae) Vector of Chagas Disease in Bolivia. Trop. Med. Int. Health TM IH 2010, 15 (9), 1037–1048. Snijder, J.; Uetrecht, C.; Rose, R. J.; Sanchez-Eugenia, R.; Marti, G. A.; Agirre, J.; Guérin, D. M. A.; Wuite, G. J. L.; Heck, A. J. R.; Roos, W. H. Probing the Biophysical Interplay between a Viral Genome and Its Capsid. Nat. Chem. 2013, 5 (6), 502–509. Agirre, J.; Goret, G.; LeGoff, M.; Sánchez-Eugenia, R.; Marti, G. A.; Navaza, J.; Guérin, D. M. A.; Neumann, E. Cryo-Electron Microscopy Reconstructions of Triatoma Virus Particles: A Clue to Unravel Genome Delivery and Capsid Disassembly. J. Gen. Virol. 2013, 94 (Pt 5), 1058–1068. Dick, G. W. A.; Kitchen, S. F.; Haddow, A. J. Zika Virus. I. Isolations and Serological Specificity. Trans. R. Soc. Trop. Med. Hyg. 1952, 46 (5), 509–520. Malone, R. W.; Homan, J.; Callahan, M. V.; Glasspool-Malone, J.; Damodaran, L.; Schneider, A. D. B.; Zimler, R.; Talton, J.; Cobb, R. R.; Ruzic, I.; Smith-Gagen, J.; Janies, D.; Wilson, J.; Zika Response Working Group. Zika Virus: Medical Countermeasure Development Challenges. PLoS Negl. Trop. Dis. 2016, 10 (3), e0004530. Russo, F. B.; Beltrão-Braga, P. C. B. The Impact of Zika Virus in the Brain. Biochem. Biophys. Res. Commun. 2017. Sampathkumar, P.; Sanchez, J. L. Zika Virus in the Americas: A Review for Clinicians. Mayo Clin. Proc. 2016, 91 (4), 514–521. Vaney, M.-C.; Rey, F. A. Class II Enveloped Viruses. Cell. Microbiol. 2011, 13 (10), 1451– 1459. Kostyuchenko, V. A.; Lim, E. X. Y.; Zhang, S.; Fibriansah, G.; Ng, T.-S.; Ooi, J. S. G.; Shi, J.; Lok, S.-M. Structure of the Thermally Stable Zika Virus. Nature 2016, advance online publication. Dejnirattisai, W.; Supasa, P.; Wongwiwat, W.; Rouvinski, A.; Barba-Spaeth, G.; Duangchinda, T.; Sakuntabhai, A.; Cao-Lormeau, V.-M.; Malasit, P.; Rey, F. A.; Mongkolsapaya, J.; Screaton, G. R. Dengue Virus Sero-Cross-Reactivity Drives Antibody-Dependent Enhancement of Infection with Zika Virus. Nat. Immunol. 2016, 17 (9), 1102–1108. Rouvinski, A.; Guardado-Calvo, P.; Barba-Spaeth, G.; Duquerroy, S.; Vaney, M.-C.; Kikuti, C. M.; Navarro Sanchez, M. E.; Dejnirattisai, W.; Wongwiwat, W.; Haouz, A.; Girard-Blanc, C.; Petres, S.; Shepard, W. E.; Desprès, P.; Arenzana-Seisdedos, F.; Dussart, P.; Mongkolsapaya, J.; Screaton, G. R.; Rey, F. A. Recognition Determinants of Broadly Neutralizing Human Antibodies against Dengue Viruses. Nature 2015, 520 (7545), 109–113. Barba-Spaeth, G.; Dejnirattisai, W.; Rouvinski, A.; Vaney, M.-C.; Medits, I.; Sharma, A.; Simon-Lorière, E.; Sakuntabhai, A.; Cao-Lormeau, V.-M.; Haouz, A.; England, P.; Stiasny, K.; Mongkolsapaya, J.; Heinz, F. X.; Screaton, G. R.; Rey, F. A. Structural Basis of Potent Zika– dengue Virus Antibody Cross-Neutralization. Nature 2016, 536 (7614), 48–53. Rey, F. A. Dengue Virus: Two Hosts, Two Structures. Nature 2013, 497 (7450), 443–444. Zhang, S.; Kostyuchenko, V. A.; Ng, T.-S.; Lim, X.-N.; Ooi, J. S. G.; Lambert, S.; Tan, T. Y.; Widman, D. G.; Shi, J.; Baric, R. S.; Lok, S.-M. Neutralization Mechanism of a Highly Potent Antibody against Zika Virus. Nat. Commun. 2016, 7, 13679. Marzinek, J. K.; Holdbrook, D. A.; Huber, R. G.; Verma, C.; Bond, P. J. Pushing the Envelope: Dengue Viral Membrane Coaxed into Shape by Molecular Simulations. Struct. Lond. Engl. 1993 2016, 24 (8), 1410–1420. Hsieh, Y.-C.; Poitevin, F.; Delarue, M.; Koehl, P. Comparative Normal Mode Analysis of the Dynamics of DENV and ZIKV Capsids. Front. Mol. Biosci. 2016, 3, 85. Ferrarotti, M. J.; Bottaro, S.; Pérez-Villa, A.; Bussi, G. Accurate Multiple Time Step in Biased 26-27 ACS Paragon Plus Environment

Page 27 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Molecular Simulations. J. Chem. Theory Comput. 2014, 11 (1), 139–146. (71) Di Pasquale, N.; Gowers, R. J.; Carbone, P. A Multiple Time Step Scheme for Multiresolved Models of Macromolecules. J. Comput. Chem. 2014, 35 (16), 1199–1207. (72) Reddy, T.; Sansom, M. S. P. The Role of the Membrane in the Structure and Biophysical Robustness of the Dengue Virion Envelope. Struct. England1993 2016, 24 (3), 375–382.

27-27 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1. SIRAH proteins at a glimpse. CG beads are mapped on atomic positions named in the figure according to PDB 3.0.1 nomenclature. A one-letter code is used for residue names. Backbone mapping is only indicated for glycine. For shortness, residues sharing a similar CG topology are grouped together and the green letter indicates the residue shown. CG beads are drawn at scale according to their van der Waals radii and colored by charge from negative (red) to positive (blue) through neutral (white). Full details about topologies and interactions are described in ref 29.

ACS Paragon Plus Environment

Page 28 of 35

Page 29 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 2. Triple scale solvation model. (A) Comparison between the three models for water at different resolution. From left to right: WLS, WT4 and TIP3P molecules. The size of the spheres corresponds to the actual van der Waals radii. The interaction parameters of WT4 and WLS are reported in the table. (B) Snapshot of a triple-resolution solvent system after 10 ns MD simulation. Water molecules are colored red (oxygen) and white (hydrogen). The two CG regions are represented by colored surfaces. The different regions (top) match to the density profile across the box (bottom). The dashed line corresponds to the sum of the three densities considering that WT4 and WLS beads correspond to 2.8 and 14 water molecules, respectively. (C) Coordination numbers of WLS beads around WT4 and water. (D) Number density distribution of the different species around water oxygen. The thick gray trace was calculated on a system containing pure TIP3P water 25 and is used here as a standard. The black dashed line corresponds to the mass weighted sum of all species (as in panel B).

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3. Simulated VLP systems and global MD descriptors. Each panel shows from top to bottom: Molecular representations of the simulated systems at identical size scales. The VLP is drawn as a yellow surface while solvent particles are shown as spheres of corresponding sizes; Solvent density profiles along the simulation box from the center of the VLP in pure FG (yellow) or multiscale solvation (coloring TIP3P in red, WT4 in blue and WLS in gray). The experimental RGYR of the VLP is indicated as a vertical dashed line; Time evolution of RGYR and RMSD.

ACS Paragon Plus Environment

Page 30 of 35

Page 31 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 4. FG vs. FG/CG/SG simulations of SeMV. (A) Surface representation of the VLP showing a central monomer (purple ribbons) and its contacts at 3-fold (blue ribbons) and 5-fold symmetry axis. Calcium ions are shown as yellow spheres. View points of panels B-D are indicated. (B,C) Representative water occupancies at 5-fold symmetry axis and inside the monomer fold. SDFs are drown using isosurface values of 0.4 and colored according to FG (green) and FG/CG/SG (red). Crystallographic waters are represented as blue spheres. (D) Coordination of calcium ions at the binding site. Comparison among experimental structure (gray) and representative snapshots from FG (green) and FG/CG/SG (red). (E) Number of occupied Calcium binding sites. (F) Average amount of inter-monomer contacts relative to the experimental structure. (G) Average RMSF values for both solvation schemes and individual traces for FG/CG/SG (orange). The experimental secondary structure assignment is also indicated. Analysis of panels B-D and G were done at the last 10 ns of simulation.

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 5. FG/CG/SG vs CG/SG simulations of TrV. (A) Molecular representation of the solvent accessible surface of TrV. Colored segments correspond to proteins VP1-3 composing the capsid. The black shaded area delimits the pentamer of trimers. (B) Average intermolecular contacts of VP1, VP2 or VP3 with their respective neighboring chains. (C) Residue level RMSF from 100-150 ns of simulation. Thick and thin traces represent average and individual values, respectively for atomistic (black and grey) and CG (red and orange) proteins.

ACS Paragon Plus Environment

Page 32 of 35

Page 33 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 6. CG/SG simulation of ZIKV. (A) On the left, ectodomain of E protein dimer with one monomer colored by functional domains and the other in gray. On the right, molecular representation of the VLP showing the E proteins' distribution along the icosahedral symmetry axes. Black dots point the geometric center of DII domains in the dimers from which the raft angle (black line) is measured (see panel C). (B) Average amount of inter and intra dimer contacts relative to the experimental structure. (C) Distribution of conformations along the simulation for the 30 E protein rafts in the VLP. Angle defined in panel A. Isocontour regions are colored according to the number of rafts having a given angle value (scale on the right). (D) Comparison between experimental and average normalized B-factors from the last 50 ns of the simulation. The experimental secondary structure assignment and the functional domains are also indicated.

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 7. Computational cost of VLP simulations at different solvation schemes. (A) Estimations of solvent particles needed to solvate 1 nm thick spherical VLPs according to settings detailed on the section Methods. Dots point the actual simulated systems. Performances were obtained on an Intel Core i7 3.7 GHz CPU with a GPU Tesla K40c. Stars denote the VLP size limit for running on a single GPU. The GPU memory limit was taken from benchmarks at http://ambermd.org/gpus on explicit solvent simulations using a Tesla K40c graphic card. The size of fully atomistic HIV-1 system simulated at Blue Waters supercomputer 10 is indicated as a reference. (B) Size distribution of available icosahedral viruses at PDB (http://pdb101.rcsb.org/learn/posters-flyers-and-calendars/200-icosahedral-viruses) and affordable simulation range on one GPU according to memory limitations and system size predictions from panel A. Bigger VLPs as Faustovirus (PDB id: 5J7V) were considered in the analysis but scape the graph scale.

ACS Paragon Plus Environment

Page 34 of 35

Page 35 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Graphical Abstract

ACS Paragon Plus Environment