Molecular Dynamics-Based Virtual Screening: Accelerating the Drug

Sep 3, 2013 - High-performance computing (HPC) has become a state strategic technology in a number of countries. One hypothesis is that HPC can ...
0 downloads 0 Views 2MB Size
Subscriber access provided by UNIVERSITY OF ALABAMA

Article

Molecular dynamics-based virtual screening: Accelerating drug discovery process by high performance computing Hu Ge, Yu Wang, Chanjuan Li, Nanhao Chen, Yufang Xie, Mengyan Xu, Yingyan He, Xinchun Gu, Ruibo Wu, Qiong Gu, Liang Zeng, and Jun Xu J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/ci400391s • Publication Date (Web): 03 Sep 2013 Downloaded from http://pubs.acs.org on September 5, 2013

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Molecular dynamics-based virtual screening: Accelerating drug discovery process by high performance computing Authors: Hu Ge1, Yu Wang1, Chanjuan Li1, Nanhao Chen1, Yufang Xie1, Mengyan Xu1, Yingyan He1, Xinchun Gu1, Ruibo Wu1, Qiong Gu1, Liang Zeng2, Jun Xu1,* Affiliations 1

School of Pharmaceutical Sciences & Institute of Human Virology, Sun Yat-Sen University, 132 East Circle Road at University City, Guangzhou, 510006, China. 2

College of Computer Sciences, National University of Defense Technology, Changsha, 410073, China *Correspondence to: [email protected] Abstract High-performance computing (HPC) has become a state strategic technology in a number of countries. One hypothesis is that HPC can accelerate bio-pharmaceutical innovation. Our experimental data demonstrate that HPC can significantly accelerate bio-pharmaceutical innovation by employing molecular dynamics-based virtual screening (MDVS). Without using HPC, MDVS for a 10K compound library with tens of nanoseconds MD simulations requires years of computer time. In contrast, a state of the art HPC can be 600 times faster than an 8-core PC server is in screening a typical drug target (which contains about 40K atoms). Also, careful design of the GPU/CPU architecture can reduce the HPC costs. However, the communication cost of parallel computing is a bottleneck that acts as the main limit upon further virtual screening improvements for drug innovations.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Introduction High performance computing (HPC) has developed rapidly worldwide. The race to build faster supercomputers has become a source of national pride; these machines are valued for their ability to solve problems critical to national interests, in areas such as defense, energy, finance, and science. Supercomputing has become a fundamental tool in the global competition for economic growth and scientific and technological development. For the pharmaceutical industry, the question is whether and how HPC can accelerate biopharmaceutical innovation? The difficult and time-consuming process of screening compounds against drug targets is one bottleneck that limits bio-pharmaceutical innovation. Virtual screening (VS), where molecular dynamics (MD) simulation can play a key role 1, is an attractive solution to this problem. Applying MD principles within a VS context for both preand post-data analysis (i.e., MD-based virtual screening, MDVS) is an effective way of reducing false positives (FP) and false negatives (FN) in a VS campaign. In 2008, MD simulation of a protein, for 10 µs, required more than 3 months of supercomputing time.2 It will require even more HPC time to simulate the interactions between complexes of thousands of ligands and a protein (i.e., a drug target).3 Two MDVS protocols that use MD simulations to improve the quality of hits by calculating the binding stabilities of hits with their target are depicted in Fig. 1. The protocols reported in this paper are readily transferable to other targets with known 3D structures, and can be used to screen small molecular libraries. The protocols are mainly for benchmark purpose, without extensive optimization. The user can further optimize the protocol by modifying MD/docking parameters or adding filters, such as Lipinski rules, for a specific virtual screening campaign. The protocols are implemented in Linux scripting language with HPC specific command lines, and can be shared upon the request.

ACS Paragon Plus Environment

Page 2 of 27

Page 3 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Fig. 1. Two typical MDVS protocols

A target-ligand complex can consist of thousands of atoms. In such a situation, the MD have to be simulated over a period of nanoseconds for researchers to effectively conclude that the ligand is a hit for further pharmaceutical study. Also, sometimes, the time-scale for atomic interaction requires that the time interval for the MD simulation be as short as one femtosecond (fs, 10-15 s) 4. These requirements create additional demand for HPC power; this demand has become one of driving forces for building faster HPC systems 5. Recently, graphics processing units (GPUs), originally developed for computing 3D functions, have provided unprecedented computational power for scientific applications. Examples include Quantum Monte Carlo 6, spectroscopic 7 and gravitational simulations 8 in the area computational physics, and sequence analysis in the area of computational biology 9, 10. The current peak performance of state of the art GPUs is approximately ten times faster than that of comparable CPUs. Therefore, more and more parallel MD simulation programs have GPU versions to take advantages of the GPU computing power. To validate the impact of HPC on virtual screening within the drug innovation process, three GPU-based parallel MD programs, AMBER 11,12, NAMD 4, and GROMACS 13, 14, have been employed for experimentation in this report. The MDVS experiments were conducted with both CPU and GPU computer systems. The goal is to determine whether and how HPC can accelerate bio-pharmaceutical innovation, and identify the limits of such acceleration.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Materials and Methods Data sets for molecular dynamics One obvious determinant for the performance is the size of the system simulated. To find out the distribution of the size of drug target systems, a comprehensive database containing 1,004 three-dimensional structures of known and potential drug targets was created, using data from literatures and online databases such as Drug Bank15, PDB (Protein Data Bank), SciFinder, TTD (Therapeutic Target Database)16, PDTD (Potential Drug Target Database)17 and TRP (Thomson Reuters Pharma) (Supplementary Table S1 “Table S1_DrugTarget.xlsx”). To calculate the atom number of solvated drug target systems, each structure was solvated in a rectangular box of TIP3P water molecules with solvent layers 10 Å between the box edges and solute surface. This was done by tLeap module of AMBER. To test the MD performance with different programs, thirty diverse targets were selected from the above database as the test data set (Supplementary Table S2). For each target, the topology and parameter files for AMBER, NAMD and GROMACS were prepared, respectively. All three programs apply ff99SB force field because it is widely used and is the common force field distributed with AMBER, GROMACS, and NAMD. Molecular dynamics simulations For purposes of comparison, each target-ligand complex in the test data sets was simulated repeatedly using AMBER, GROMACS, and NAMD. Prior to the production step, the data are processed with the following steps: (1) the solutes were fixed, and the solvent molecules with counter ions were allowed to move during a 2000-step minimization; (2) all atoms were relaxed by a 2,000-step full minimization; (3) after the minimization, each complex was gradually heated from 0 to 300 K in 600 ps, with solutes constrained at a weak constraint; (4) periodic boundary dynamics simulations of 20 ns were carried out with an NPT (constant composition, pressure, and temperature) ensemble at 1 atm and 300 K in the production step, using CPU or GPUaccelerated versions of AMBER, GROMACS, and NAMD. The particle-mesh Ewald 18, 19 (PME) method was used to treat long-range electrostatic interactions. A residue-based cutoff of 10 Å was utilized for non-covalent interactions. Algorithms constraining the temperature, pressure, and hydrogen bonds were used and might vary according to its package. Binding free energy was calculated by the MM/PBSA (molecular mechanics Poisson−Boltzmann surface area) 20 method, using MMPBSA.py. 21. For each protein-ligand

ACS Paragon Plus Environment

Page 4 of 27

Page 5 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

complex, 20 snapshots were extracted from the last nanosecond trajectory with an interval of 10 ps; these snapshots were used to calculate ∆G. Normal mode calculation was used to evaluate the entropy contribution to ∆G. The versions of the MD packages were AMBER 12, GROMACS 4.6, and NAMD 2.9. They were all compiled by the Intel C/C++/Fortran compiler 11.1.059. The Nvidia CUDA GPU driver version was 4.0, with ECC on. Molecular docking For enrichment tests, three data sets containing actives and ligands were built and their structures were subjected to energy minimization through the MMFF94 22 force-field to obtain the lowest energy conformation prior to docking. The PDB files for the three targets were 1HXB for HIVPR, 2HU4 for NA, and 3VI8 for PPARα. The receptors were protonated at pH 7.4, and their original ligands in the binding pocket were removed. The native ligand binding site was selected as the binding site for docking. Each compound in the data set was docked into the three drug targets with GLIDE, MOE, CDOCKER, and SURFLEX. Other parameters were set with default values. For GLIDE, the SP (standard precision) mode was used. For MOE, the AMBER12/EHT force field and GBVI/WSA ∆G refinement were used. For CDOCKER, the random conformations were increased to 50. The docked poses were ranked according to each program’s built-in scoring function, and the top-ranked binding pose for each compound was selected for further MD simulations. Performance monitoring and data analysis For each simulation, the average time of the MD simulations were recorded. The backbone RMSD values of the trajectories were calculated to monitor the simulation quality. To study the scalability of different systems/programs, parallel acceleration and efficiency were calculated according to the following formula:

Pe =

Ts nTp

where Ts is the time required to finish a MDVS task with a single CPU or GPU, Tp is the time to finish a MDVS task with multiple CPUs or GPUs, and n is the number of CPUs or GPUs used in the parallel computing. In this study, a unit of CPU/GPU is defined by a thread. One CPU means one CPU core, one GPU means one Tesla card.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

To evaluate the memory requirement for running MD simulations of drug targets, the realtime memory usage for each run was recorded regularly. The real-time memory occupation of every thread was observed and recorded every ten seconds. The per-node memory usage was calculated by adding the value of each thread together. For a GPU program, the summation was not necessary, for there was only one thread per node. Data sets for enrichment tests To measure the performance of MDVS, three enrichment tests were carried out. Three drug targets, PPARα, Neuraminidase, and HIV protease were selected. Three data sets of their inhibitors and decoys were collected from different sources (Supplementary Table S3), and the compounds of the data sets were docked onto the native ligand binding sites of the corresponding targets for virtual screening purposes with GLIDE23, 24, MOE (Chemical Computing Group), CDOCKER (Accelrys), and SURFLEX (Tripos). The data sets were created as following: 1.

PPARα. 343 PPARα inhibitors with IC50 values of < 1 µM were downloaded from

BindingDB25. Using MACCS fingerprint similarity algorithms, a diverse subset of 100 ligands were derived from it. Then, 1,000 decoy compounds were selected by RADER (i.e., the Rapid Decoy Retriever, an in-house program). 2.

Neuraminidase. 132 NA inhibitors were collected from the academic literature. 1,000

random decoy compounds were selected from the Schrödinger Drug-Like Ligand Decoys Set23, 24

, and a data set of 1132 entries was constructed.

3.

HIV protease. 62 HIV protease inhibitors were extracted from a DUD26 (i.e., a Directory

of Useful Decoys). Using MACCS fingerprint similarity algorithms, 620 decoy compounds were derived from the DUD HIV protease decoys set.

Results Determinants of the MDVS computing power requirement. The number of atoms in a target-ligand complex, the number of compounds in a virtual library, the time scale for MD simulation, the biochemical type, and the MD simulation program are the main determinants of the HPC power requirement. A survey on 1,004 known and potential drug targets (Supplementary Table S1. “Table S1_DrugTarget.xlsx”) indicates that the most common drug target (Fig. 2) has 30K to 40K atoms in a solvated target-ligand complex. The biochemical types covered in this report are listed in Table S2.

ACS Paragon Plus Environment

Page 6 of 27

Page 7 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Fig. 2. Scales of solvated drug targets and their distributions

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

MD performance and target size. MDVS were applied on thirty drug targets using AMBER, GROMACS, and NAMD, for 20 ns MD simulations. The computations were executed in parallelized CPU and GPU systems. For each target-ligand complex, the performance (measured in units of ns/day) was calculated for each parallel level, and one peak performance out of all the gradient parallel levels was obtained (Fig. 3). Our study reveals that as a target size increases, the peak performance drops. No significant difference was observed among the three programs. For the most common target size of about 30K to 40K atoms, a peak performance of about 50 ns/day was achieved.

ACS Paragon Plus Environment

Page 8 of 27

Page 9 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Fig. 3. The peak performance for MDVS of thirty targets using AMBER, NAMD, and GROMACS with CPU and GPU computers. The curves are fitted to indicate the trends.

Different MD simulation parameter settings (e.g., settings regarding the time step integrator, temperature, and pressure) in the three MD simulation packages make comparisons among the programs difficult. Therefore, direct comparison of the performance among all three MD programs would be unfair and nonsensical. The parallel efficiency, however, may reflect whether the software takes full advantage of HPC’s power.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 27

MD performance and parallel efficiency. The contour maps in Fig. 4 demonstrate the relations between target size (Y-axis, number of atoms), parallelization (X-axis, number of CPUs or GPUs), and parallel efficiency (Pe), which is calculated via the following equation:

Ts nTp where Ts is the time to finish a MDVS task with a single CPU or GPU, Tp is the time to finish a Pe =

MDVS task with multiple CPUs or GPUs, and n is the number of CPUs or GPUs used in the parallel computing.It should be noted that in Fig. 5, blue stands for 0% Pe, and red stands for 100% Pe . Fig. 4 reveals the following trends: (1) Pe declines when the number of CPUs increases; (2) Pe declines more rapidly when the number of GPUs increases; (3) increasing the number of CPUs or GPUs for a larger target size does not proportionally increase the Pe; this may be due to the increasing communication costs of more CPUs or GPUs joining the parallel computing; (4) NAMD seems to have good parallel efficiency for larger targets on both GPUs and CPUs; and (5) AMBER and GROMACS have better parallel efficiency on CPUs than GPUs.

ACS Paragon Plus Environment

Page 11 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Fig. 4. The parallel efficiencies of MDVS for thirty targets, which resulted from using the AMBER, NAMD, and GROMACS programs, all of which ran on both CPUs and GPUs.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Memory concern. All three programs have minimal RAM (random access memory) requirements to run MDVS on a target (Fig. S1).

Fig. S1 The minimal memory requirements of the AMBER, GROMACS, and NAMD programs.

ACS Paragon Plus Environment

Page 12 of 27

Page 13 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

CPU-based programs demand more memory than GPU-based programs do: GPU-based and CPU-based programs require 500 MB and 2GB of memory for an average sized drug target, respectively. It is because GPU has a larger built-in memory. For example, Tesla M2050 GPU has 3 GB of memory, which meets the needs of MDVS. For a TH-1A supercomputer, each parallel node has 24 GB of memory, which almost meets the memory need of any drug target. The consistency of CPU-based and GPU-based MD programs. To determine whether CPU-based and GPU-based programs produce consistent results, we ran the CPU-based and GPU-based AMBER programs (PMEMD) on thirty drug targets (Table S2) to simulate their molecular dynamics for 20 ns. Then, we calculated the RMSDs of the backbone atoms in the trajectories (of both the CPU-based and GPU-based AMBER programs) and the corresponding crystal structures of the targets. The results are depicted in Fig. S2, which indicates that most of the MD simulated trajectories from GPU-based programs are consistent with the targets’ corresponding crystal structures (RMSD < 3Å), and most of trajectories from GPU-based and CPU-based programs are consistent with one another, except for 1XY1 and 2GBP. With regard to the exception, the CPU-based programs produced results that were inconsistent with the corresponding crystal structures.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

Page 14 of 27

Page 15 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Fig. S2. The RMSDs of the backbone atoms in the trajectories of both CPU-based and GPU-based AMBER programs, and the corresponding crystal structures for the thirty drug targets.

MDVS performance and parallelization. A MDVS was performed to screen a compound library derived from the NCI Open Database (the library was released May 4, 2012) against the MJ0796 ATP-binding cassette target (PDB code: 1F3O). The library consists of 265,242 compounds, which were docked to the receptor-binding site of 1F3O with GLIDE. The 10,000 best scored poses were selected and the corresponding compound-receptor complexes were created for the subsequent one ns MD simulations with AMBER, NAMD, and GROMACS. The entire MDVS task was divided into sub-tasks and distributed to a number of CPUs and GPUs for calculations; the calculations utilized different parallelization strategies. The relationship between the time needed to complete an MDVS campaign (for 10K compounds) and the HPC parallelization strategies (of different programs) are depicted in Fig. 5.

Fig. 5. The relationship between the time needed to complete an MDVS campaign and the HPC parallelization strategy (of different programs). The MDVS campaign involves screening 10K compounds. PCPU: Number of parallelized CPUs called by each MDVS sub-task. PGPU: Number of parallelized GPUs called by each MDVS subtask. A: The MDVS campaign run by three programs in the 512-node parallelized CPUs of TH-1/GZ. B: The MDVS campaign run by three programs in the 512 nodes parallelized GPUs of TH-1/GZ. C: The MDVS campaign

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

run by three programs in the 7,168 nodes parallelized CPUs of TH-1A (theoretical projections). D: The MDVS campaign run by three programs in the 7,168 nodes parallelized GPUs of TH-1A (theoretical projections).

Figs. 5A and 5B show that increasing the number of CPUs or GPUs for each MDVS subtask will prolong the time needed to accomplish each sub-task. Consequently, it will take a longer time to complete the MDVS campaign. This is because higher PCPU and PGPU values increase the communication costs among calculating units, which in turn cause delay and waste computing resources. As shown in Figs. 5C and 5D, when a MDVS campaign can employ a greater number of HPC nodes (7,168) on TH-1A, the virtual screening performances significantly improve with the increases of PCPU and PGPU (before they reach 8 parallelized processing units). After that, the communication costs among calculating units will slowly reduce the MDVS campaign performance; this effect is particularly significant in the case of GPUs. This is because the serial architecture (or low parallelization per sub-task) cannot take full advantage of the higher HPC power. The relationship between the time needed to accomplish a MDVS sub-task and its parallelization strategy can be described with the following formula:

 pN poses  t T =   CN nodes  s where T is the time required for the entire MDVS task to be complete, p is the number of threads that run on paralleled CPUs or GPUs, Nposes is the number of all ligand binding poses, Nnodes is the number of HPC hardware nodes, C is the maximal number of CPU or GPU threads per HPC hardware node, t is the MD simulation time (measured in nanoseconds, ns), and s is the MD simulation throughput (ns/day). The minimal time cost for the same MDVS campaign to run on a PC server, as opposed to the time costs of the three HPC systems, are compared in Table 1. Field-tests were carried out on 200 nodes of TH-1A and 512 nodes of TH-1/GZ (a pilot system of TH-2A in Guangzhou, China), separately. The times needed for a PC to complete the MDVS campaign were extrapolated based on a mainstream PC server with eight CPU cores. It is impossible to use a PC server to complete a MDVS campaign requiring thousands of days. Using 512 nodes of TH-1/GZ, the MDVS campaign can be accelerated more than 600 times. If the entire computing power of TH-1A is employed, the MDVS campaign can be done in less than 30 minutes with GROMACS_GPU, which is about 3,324 times faster than that of a PC with GPU,

ACS Paragon Plus Environment

Page 16 of 27

Page 17 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

or 35,430 times faster than a PC without GPU. With the power of HPC, incorporating MD into drug screening no longer seems a problem of yes or no, but of when and where. Table 1 Total times for the same MDVS campaign, as ran on a PC server and three HPC systems Time spent (day)

PCa

TH-1Ab TH-1/GZb TH-1Aa (200 nodes) (512 nodes) (7168 nodes)

AMBER_CPU 2,314 8.91 3.70 GROMACS_CPU 771 2.84 1.17 NAMD_CPU 3,069 11.06 4.55 AMBER_GPU 72 0.29 0.12 GROMACS_GPU 53 0.22 0.09 NAMD_GPU 643 2.57 1.03 a b Extrapolated measures. Experimental measures.

0.279 0.089 0.337 0.022 0.017 0.097

CPU/GPU ratio and MD simulation cost. GPUs are co-processors, and have to work together with CPUs. The CPI (capacity-to-price index) for HPC can be estimated as the following: CPI =

N GPU PGPU + ( N CPU − N GPU ) PCPU N GPU U GPU + N CPU U CPU

CPI is measured in ps/day/dollar. NGPU and NCPU are the number of GPUs and CPUs built in an HPC node. PGPU and PCPU are the maximal performance values contributed by a thread from a CPU or GPU card, as measured in ns/day. UCPU and UGPU are the unit prices of a CPU core or a GPU card. At the time of writing, the market price for a NVIDIA Tesla M2050 (Fermi GPU Card - 3GB GDDR5) is $1,090; the price for an Intel Xeon X5670 processor (2.93 GHz 12 MB Cache Socket LGA1366) is $1,120; the price for a CPU core is about $187. An average drug target with 3040K atoms, such as 1F3O, requires no more than 24 CPU cores and 4 GPU cards. Therefore, the relationships between CPI, the number of CPUs, and the number of GPUs are depicted in Fig. 6. Since each GPU card needs at least one CPU core to function, the architecture of one CPU with multiple GPUs is not valid. Fig. 6A indicates that 1) the CPI drops when the CPU/GPU ratio increases, and 2) the optimized CPU/GPU ratio is 4:4 for parallelized AMBER. GROMACS shows a similar trend to that of AMBER (Fig. 6C). NAMD has a different trend: CPI will increase with increasing numbers of CPUs and GPUs (Fig. 6B). In short, CPU/GPU ratio should be greater than 1, and the higher the ratio, the better.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Fig. 6. The relationship between of CPI (cost performance index) and CPU/GPU ratio. A: AMBER, B: NAMD, C: GROMACS.

The hit enrichment of MDVS. To measure the performance of MDVS, three drug targets, PPARα, Neuraminidase, and HIV protease, were selected. Three data sets of their inhibitors and decoys were collected from different sources (Table S3), and the compounds of the data sets were docked onto the native ligand binding sites of the corresponding targets for virtual screening purposes with GLIDE 23, 24, MOE (Chemical Computing Group), CDOCKER (Accelrys), and SURFLEX (Tripos). The docked poses were ranked according to each program’s built-in scoring function. The performances of various virtual screening methods, including MDVS, are demonstrated with receiver operating characteristic curves (ROCs) 27 (specifically, through the corresponding areas under the curves (AUCs) in Fig. 7).

ACS Paragon Plus Environment

Page 18 of 27

Page 19 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Fig. 7. Receiver operating characteristic curves (ROC) for virtual screening campaigns. FPR: false positive rate (% decoys), TPR: true positive rate (% actives). The ROC for MDVS is in magenta. A: PPARα, B: Neuraminidase, C: HIV protease.

In the TH-1/GZ HPC system, one ns of MD simulation was applied to each top receptorligand complex, and its binding free energy was evaluated. The enrichment for binding free energy results were compared as an individual series of ROC plots (Fig. 8). Although different programs demonstrate different virtual screening performances for three targets, the ROC curves of MDVS are always the best. To further measure performance 28, 29, AUCs, enrichment factors (EF), Robust Initial Enhancements (RIE), areas under the accumulation curve (AUAC), and Boltzmann-enhanced discrimination of receiver operating characteristics (BEDROC) 29 were calculated (Supplementary Table S4). The enrichment test of HIV protease was also reported in literature30, where AutoDock 4 and Vina demonstrated AUCs of 0.40 and 0.66, respectively. The AUC of MDVS is 0.78, which is 18.2% greater than reported. All these metrics indicate that the MDVS approach is superior to conventional virtual screening approaches. These results are consistent with the observations of our colleagues 31.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Discussion One of the bottlenecks in bio-pharmaceutical innovation is biological screening. Virtual screening has been an attractive solution. MDVS can significantly improve the performance of a virtual screening campaign. Reducing HPC cost and increasing the computation speed will significantly accelerate biomedical innovation. There are no disease modifying drugs against for example Alzheimer's, osteoarthritis, metabolic syndromes and important cancers, MDVS campaigns described in this paper may shorten the time of pre-clinic studies for the disease modifying drugs, and anti-pandemic drugs. MDVS involves validating the interactions of thousands or millions of individual compounds against a drug target. The task is parallelable by dividing a great number of compound-protein complexes into smaller computational packages, which are distributed to GPU-CPU units for calculations. MDVS does not require massive memory, but only demands greater HPC power. Many supercomputers adopt GPUs as co-processors to reduce the manufacturing costs and increase the calculation speed. However, the architecture of combining CPUs and GPUs is one of the key factors to computing performance. Another key factor is the application program. Today, more and more MD simulation programs offer their parallel or GPU versions, but, they have different ways of exploiting HPC power. Consequently, their performances are quite different. HPC technology may continue to obey Moore’s Law, but the updating of application programs can be a bottleneck. Therefore, it is the time to invest more in developing HPC-based application programs while we are building a supercomputer with a higher Linpack Benchmark score.

ACS Paragon Plus Environment

Page 20 of 27

Page 21 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Acknowledgments This work was funded in part by the National High-Tech R&D Program of China (863 Program) (2012AA020307), the Introduction of Innovative R&D Team Program of Guangdong Province (No. 2009010058), and the National Natural Science Foundation of China (No. 81001372, 81173470). The National Supercomputing Centers in Guangzhou (2012Y2-00048, 201200000037), Tianjin, and Shenzhen have provided the computing resources necessary for this report.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Supporting Information Figure S1-S2: in previous text Table S1. The numbers of solvated drug targets. Each target was solvated in TIP3P water model. The data were collected from the academic literature and DrugBank 15, the PDB (Protein Data Bank), SciFinder, the TTD (Therapeutic Target Database) 16, the PDTD (Potential Drug Target Database) 17, and TRP (Thomson Reuters Pharma)). Excel file: “TableS1_DrugTarget.xlsx” Table S2. Thirty drug targets being MD simulated at 20 ns PDB ID

Biochemical classification

Size (atoms)

1XY1

Factor Regulator and Hormones

1ICN

Binding Protein

16,746

1I92

Signaling Protein

18,634

1LIC

Lipid Binding Protein

19,944

5GAL

Factor Regulator and Hormones

20,826

1EPB 1QCY

Nuclear Receptor Structural Proteins

22,894 25,013

1F3O

Structural Proteins

32,031

2GBP

Binding Protein

35,978

1ZXQ

Monoclonal Antibodies

38,421

2AHE

Ion Channels

38,901

1S50 1R12

Receptor Enzyme

40,067 41,077

3ERT

Nuclear Receptor

42,411

1JVM

Ion Channels

52,302

1EWK

Signaling Protein

52,976

1ZHY

Lipid Binding Protein

57,061

2AYO 1IJE

Enzyme Factor Regulator and Hormones

63,380 66,066

1V7M

Monoclonal Antibodies

71,635

1YTZ

Receptor

79,886

2BX8

Transport Proteins

81,815

8CAT

Enzyme

86,560

1K5D

Signaling Protein

91,274

2R07

Enzyme

96,886

6,478

ACS Paragon Plus Environment

Page 22 of 27

Page 23 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

1QAB

Transport Proteins

99,738

1O7D

Enzyme

106,562

3HVT

Enzyme

121,747

1ASE

Enzyme

179,163

1I6V

Enzyme

205,149

Table S3. Enrichment tests for MDVS Target PDB Actives Decoys 3VI8 100 1000 PPARα 132 1000 Neuraminidase 2HU4 1HXB 62 620 HIV protease

Source BindingDB In-house collection from the literature DUD

Table S4. Statistical data for the MDVS enrichment tests Target

PPARα

NA

Metric AUC RIE AUAC BEDROC(α=160.9) BEDROC(α=20.0) BEDROC(α=8.0) % Actives 1% % Actives 2% % Actives 5% % Actives 10% % Actives 20% EF 1% EF 2% EF 5% EF 10% EF 20% AUC RIE AUAC BEDROC(α=160.9) BEDROC(α=20.0) BEDROC(α=8.0) % Actives 1% % Actives 2% % Actives 5% % Actives 10% % Actives 20% EF 1%

GLIDE 0.81 4.40 0.79 0.76 0.48 0.52 8.00 14.00 27.00 38.00 56.00 8 7 5.4 3.8 2.8 0.74 4.91 0.72 0.99 0.63 0.58 8.33 16.67 31.06 40.15 52.27 8.6

MOE 0.68 1.90 0.67 0.15 0.21 0.33 1.00 3.00 10.00 21.00 42.00 1 1.5 2 2.1 2.1 0.55 2.24 0.54 0.36 0.29 0.31 3.03 7.58 12.88 19.70 30.30 3.1

CDOCKER 0.74 2.60 0.71 0.25 0.28 0.39 4.00 7.00 13.00 25.00 49.00 4 3.5 2.6 2.5 2.4 0.67 3.24 0.65 0.49 0.42 0.44 4.55 9.85 18.18 31.82 43.18 4.7

ACS Paragon Plus Environment

SURFLEX 0.79 3.76 0.76 0.47 0.41 0.51 5.00 8.00 20.00 37.00 62.00 5 4 4 3.7 3.1 0.61 2.79 0.59 0.65 0.36 0.38 5.30 8.33 14.39 25.00 39.39 5.5

MD 0.83 4.82 0.80 0.69 0.52 0.60 8.00 13.00 28.00 47.00 68.00 8 6.5 5.6 4.7 3.4 0.80 5.09 0.77 0.98 0.66 0.62 8.33 15.91 31.82 42.42 57.58 8.6

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

HIVPR

EF 2% EF 5% EF 10% EF 20% AUC RIE AUAC BEDROC(α=160.9) BEDROC(α=20.0) BEDROC(α=8.0) % Actives 1% % Actives 2% % Actives 5% % Actives 10% % Actives 20% EF 1% EF 2% EF 5% EF 10% EF 20%

8.2 6.2 4 2.6 0.50 1.46 0.50 0.20 0.16 0.19 1.61 3.23 8.06 12.90 19.35 1.6 1.6 1.6 1.3 0.97

3.7 2.6 2 1.5 0.57 3.66 0.56 0.90 0.40 0.37 9.68 16.13 22.58 24.19 33.87 11 7.9 4.5 2.4 1.7

4.8 3.6 3.2 2.2 0.78 4.34 0.75 0.54 0.47 0.49 6.45 11.29 27.42 41.94 50.00 6.3 5.5 5.5 4.2 2.5

4.5 2.9 2.5 2 0.68 4.04 0.66 0.73 0.44 0.44 6.45 14.52 25.81 32.26 43.55 7.9 7.1 5.2 3.2 2.2

This information is available free of charge via the Internet at http://pubs.acs.org

ACS Paragon Plus Environment

Page 24 of 27

8.2 6.3 4.2 2.9 0.78 4.47 0.76 0.66 0.49 0.51 6.45 16.13 27.42 38.71 50.00 7.9 7.9 5.5 3.9 2.5

Page 25 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

References and Notes: 1. Huang, D. N.; Gu, Q.; Hu, G.; Ye, J. M.; Salam, N. K.; Hagler, A.; Chen, H. Z.; Xu, J., On the Value of Homology Models for Virtual Screening: Discovering hCXCR3 Antagonists by Pharmacophore-Based and Structure-Based Approaches. J. Chem. Inf. Model. 2012, 52, 13561366. 2. Freddolino, P. L.; Liu, F.; Gruebele, M.; Schulten, K., Ten-microsecond molecular dynamics simulation of a fast-folding WW domain. Biophys. J. 2008, 94, L75-L77. 3. Schneider, G., Virtual screening: an endless staircase? Nature Reviews Drug Discovery 2010, 9, 273-276. 4. Kale, L.; Skeel, R.; Bhandarkar, M.; Brunner, R.; Gursoy, A.; Krawetz, N.; Phillips, J.; Shinozaki, A.; Varadarajan, K.; Schulten, K., NAMD2: Greater scalability for parallel molecular dynamics. Journal of Computational Physics 1999, 151, 283-312. 5. Shaw, D. E.; Maragakis, P.; Lindorff-Larsen, K.; Piana, S.; Dror, R. O.; Eastwood, M. P.; Bank, J. A.; Jumper, J. M.; Salmon, J. K.; Shan, Y.; Wriggers, W., Atomic-level characterization of the structural dynamics of proteins. Science 2010, 330, 341-346. 6. Anderson, A. G.; Goddard, W. A.; Schroder, P., Quantum Monte Carlo on graphical processing units. Comput. Phys. Commun. 2007, 177, 298-306. 7. Collange, S.; Daumas, M.; Defour, D., Line-by-line spectroscopic simulations on graphics processing units. Comput. Phys. Commun. 2008, 178, 135-143. 8. Belleman, R. G.; Bedorf, J.; Zwart, S. F. P., High performance direct gravitational Nbody simulations on graphics processing units II: An implementation in CUDA. New Astronomy 2008, 13, 103-112. 9. Liu, W.; Schmidt, B.; Voss, G.; Muller-Wittig, W., Streaming algorithms for biological sequence alignment on GPUs. Ieee Transactions on Parallel and Distributed Systems 2007, 18, 1270-1281. 10. Schatz, M. C.; Trapnell, C.; Delcher, A. L.; Varshney, A., High-throughput sequence alignment using Graphics Processing Units. BMC Bioinformatics 2007, 8, 474. 11. Case, D. A.; Cheatham, T. E.; Darden, T.; Gohlke, H.; Luo, R.; Merz, K. M.; Onufriev, A.; Simmerling, C.; Wang, B.; Woods, R. J., The Amber biomolecular simulation programs. J. Comput. Chem. 2005, 26, 1668-1688. 12. Weiner, P. K.; Kollman, P. A., Amber - Assisted Model-Building with Energy Refinement - a General Program for Modeling Molecules and Their Interactions. J. Comput. Chem. 1981, 2, 287-303. 13. Hess, B.; Kutzner, C.; van der Spoel, D.; Lindahl, E., GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable molecular simulation. J. Chem. Theory Comput. 2008, 4, 435-447. 14. Pronk, S.; Pall, S.; Schulz, R.; Larsson, P.; Bjelkmar, P.; Apostolov, R.; Shirts, M. R.; Smith, J. C.; Kasson, P. M.; van der Spoel, D.; Hess, B.; Lindahl, E., GROMACS 4.5: a highthroughput and highly parallel open source molecular simulation toolkit. Bioinformatics 2013, 29, 845-854. 15. Knox, C.; Law, V.; Jewison, T.; Liu, P.; Ly, S.; Frolkis, A.; Pon, A.; Banco, K.; Mak, C.; Neveu, V.; Djoumbou, Y.; Eisner, R.; Guo, A. C.; Wishart, D. S., DrugBank 3.0: a comprehensive resource for 'Omics' research on drugs. Nucleic Acids Res. 2011, 39, D1035D1041.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

16. Zhu, F.; Shi, Z.; Qin, C.; Tao, L.; Liu, X.; Xu, F.; Zhang, L.; Song, Y.; Liu, X. H.; Zhang, J. X.; Han, B. C.; Zhang, P.; Chen, Y. Z., Therapeutic target database update 2012: a resource for facilitating target-oriented drug discovery. Nucleic Acids Res. 2012, 40, D1128-D1136. 17. Gao, Z. T.; Li, H. L.; Zhang, H. L.; Liu, X. F.; Kang, L.; Luo, X. M.; Zhu, W. L.; Chen, K. X.; Wang, X. C.; Jiang, H. L., PDTD: a web-accessible protein database for drug target identification. BMC Bioinformatics 2008, 9. 18. Darden, T.; York, D.; Pedersen, L., Particle mesh Ewald: An N·log(N) method for Ewald sums in large systems. The Journal of Chemical Physics 1993, 98, 10089-10092. 19. Kholmurodov, K.; Smith, W.; Yasuoka, K.; Darden, T.; Ebisuzaki, T., A smooth-particle mesh Ewald method for DL_POLY molecular dynamics simulation package on the Fujitsu VPP700. J. Comput. Chem. 2000, 21, 1187-1191. 20. Hou, T. J.; Wang, J. M.; Li, Y. Y.; Wang, W., Assessing the Performance of the MM/PBSA and MM/GBSA Methods. 1. The Accuracy of Binding Free Energy Calculations Based on Molecular Dynamics Simulations. J. Chem. Inf. Model. 2011, 51, 69-82. 21. Miller, B. R.; McGee, T. D.; Swails, J. M.; Homeyer, N.; Gohlke, H.; Roitberg, A. E., MMPBSA.py: An Efficient Program for End-State Free Energy Calculations. J. Chem. Theory Comput. 2012, 8, 3314-3321. 22. Halgren, T. A., Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94. J. Comput. Chem. 1996, 17, 490-519. 23. Friesner, R. A.; Banks, J. L.; Murphy, R. B.; Halgren, T. A.; Klicic, J. J.; Mainz, D. T.; Repasky, M. P.; Knoll, E. H.; Shelley, M.; Perry, J. K.; Shaw, D. E.; Francis, P.; Shenkin, P. S., Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J. Med. Chem. 2004, 47, 1739-1749. 24. Halgren, T. A.; Murphy, R. B.; Friesner, R. A.; Beard, H. S.; Frye, L. L.; Pollard, W. T.; Banks, J. L., Glide: a new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening. J. Med. Chem. 2004, 47, 1750-1759. 25. Liu, T.; Lin, Y.; Wen, X.; Jorissen, R. N.; Gilson, M. K., BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic Acids Res. 2007, 35, D198-201. 26. Huang, N.; Shoichet, B. K.; Irwin, J. J., Benchmarking sets for molecular docking. J. Med. Chem. 2006, 49, 6789-6801. 27. Linden, A., Measuring diagnostic and predictive accuracy in disease management: an introduction to receiver operating characteristic (ROC) analysis. J. Eval. Clin. Pract. 2006, 12, 132-139. 28. Nicholls, A., What do we know and when do we know it? J. Comput. Aided Mol. Des. 2008, 22, 239-255. 29. Truchon, J. F.; Bayly, C. I., Evaluating virtual screening methods: Good and bad metrics for the "early recognition" problem. J. Chem. Inf. Model. 2007, 47, 488-508. 30. Chang, M. W.; Ayeni, C.; Breuer, S.; Torbett, B. E., Virtual screening for HIV protease inhibitors: a comparison of AutoDock 4 and Vina. PLoS One 2010, 5, e11955. 31. Li, Z.; Cai, Y. H.; Cheng, Y. K.; Lu, X.; Shao, Y. X.; Li, X.; Liu, M.; Liu, P.; Luo, H. B., Identification of novel phosphodiesterase-4D inhibitors prescreened by molecular dynamicsaugmented modeling and validated by bioassay. J. Chem. Inf. Model. 2013, 53, 972-981.

ACS Paragon Plus Environment

Page 26 of 27

Page 27 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

For Table of Contents Use Only Molecular dynamicsbased virtual screening: Accelerating drug discovery process by high performance computing Authors: Hu Ge1, Yu Wang1, Chanjuan Li1, Nanhao Chen1, Yufang Xie1, Mengyan Xu1, Yingyan He1, Xinchun Gu1, Ruibo Wu1, Qiong Gu1, Liang Zeng2, Jun Xu1,*

ACS Paragon Plus Environment