Holy Grails for Computational Organic Chemistry and Biochemistry

Mar 21, 2017 - massive number of computations that must be performed for molecular dynamics and for the computation of flexible systems such as protei...
0 downloads 15 Views 5MB Size
Commentary pubs.acs.org/accounts

Holy Grails for Computational Organic Chemistry and Biochemistry Published as part of the Accounts of Chemical Research special issue “Holy Grails in Chemistry”. K. N. Houk* and Fang Liu Department of Chemistry and Biochemistry and Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, Los Angeles, California 90095-1569, United States ABSTRACT: Computational chemistry and biochemistry began with Isaac Newton’s classical mechanics in the 17th century and the establishment of quantum mechanics in the 1920s. Enabled by extraordinary advances in computers, in the last half century, this field has become a robust partner with experiment. The challenges facing computational chemists and biochemists, the Holy Grails of the field, are described. These include the development of a highly accurate density functional, ideally one that has universal chemical accuracy, and accurate polarizable force fields, as well as methods to handle efficiently the massive number of computations that must be performed for molecular dynamics and for the computation of flexible systems such as proteins. We estimate when the breakthroughs that will make computation a powerful engine for chemical discovery and design will be achieved. The Holy Grails of this field involve methods to enable the accurate and efficient prediction of structures and properties of complex biological systems and materials. The principal Holy Grail is a routine computational method for the prediction and design of multicomponent, often heterogeneous, functional systems and devices.

S

ules or collections of molecules with time. Such calculations require accurate calculations of forces on atoms and recalculations millions of times since the forces change every time the atoms move at all. For all computational methods, the discovery of mathematical theories and the developments of algorithms and robust computer programs, and the testing and demonstration of accuracy and utility, are all prerequisites for the incorporation of the methodology into general use by the computational community. These developments are so important that the developers were often awarded Nobel Prizes, as noted in Figure 1. Further, the billion-fold increase in speed of computers since the mid-1960s has enabled these brilliant theoretical tools to be applied to chemical problems such as understanding and predicting organic and biochemical reactivity. This change in computer power is represented on the graph in Figure 2, with the red dot representing the computer power of the present (2017). The increase in computer power was a critical factor in the development of computational organic chemistry, not only chip size and speed but sheer space requirements: computers have been reduced in size from a roomful of machines to current handheld devices, and one day, there will be a supercomputer on a pinhead. The right half of Figure 2 represents our guess as to how computer speeds will increase during the 21st century. Deviations from Moore’s Law1 are expected, but there will surely be unanticipated surges in computer technology, due to new discoveries by engineers and computer scientists. Many current problems facing computational chemists will be solved by these inevitable increases in computer power.

ince 1960, computational chemistry has progressed from a rarity to become a full partner with experiment in the investigation of organic and biochemical structures and reactions. Computations have become essential to elucidate structures and properties of molecules, and mechanisms and selectivities of reactions. This progress was possible due both to rapid developments in computers and to new mathematics and programs for accurate quantum mechanics and molecular dynamics methods. What are the Holy Grails for chemists specializing in computational organic chemistry and biochemistry?



BRIEF HISTORY OF COMPUTATIONAL CHEMISTRY To put this question into context, we begin with a summary of how computational chemistry has developed in the last century. Figure 1 is a timeline that documents the history of significant advances in theoretical methods made during that time and before, particularly those that have had an impact on computational organic chemistry and biochemistry. These advances involve the development of theories and their implementation into computer programs. The discoveries of classical and quantum mechanics, the parametrization of equations of classical mechanics to reproduce chemical structures, the development of combined QM/MM methods, and the gradual perfection of the functionals of density functional theory over the past five decades have brought us to our current computational chemistry capabilities in quantum mechanics and molecular mechanics methods in the year 2017. These developments are marked in red (QM) and yellow (MM and QM/MM) on the timeline. The developments marked in blue are for developments in molecular dynamics (MD), which is the computational treatment of the motions of molec© 2017 American Chemical Society

Received: October 21, 2016 Published: March 21, 2017 539

DOI: 10.1021/acs.accounts.6b00532 Acc. Chem. Res. 2017, 50, 539−543

Commentary

Accounts of Chemical Research

Figure 1. Milestones in computational organic and biological chemistry.



THE BREAKTHROUGHS NEEDED TO FIND THE TECHNICAL HOLY GRAILS Before describing the Holy Grails of computational chemistry and biochemistry, we wish to describe the goals in methods development that would need to be achieved first. Figure 3 shows these methods development milestones, with an estimated timeline.

barriers) are often not known to this accuracy. However, relative rates and equilibrium constants are often known to much greater accuracy, and an order of magnitude change in rates or equilibrium constants is caused by only a 1.4 kcal/mol change in free energy at room temperature. Chemists are often concerned with a few tenths of a kcal/mol, so ±1 kJ/mol (0.25 kcal/mol) is the real target. While some very expensive multicomponent methods (e.g., Wn) can be used for cases of tens of atoms in the gas phase, breakthroughs in DFT, or some still unknown mathematics, are needed to achieve chemically accurate methods for millions of atoms of all types.3 Perhaps combinations of quantum mechanics with machine learning and neural networks will be the key to achieving this universal chemically accurate functional.4

A Chemically Accurate Universal Density Functional

Hohenberg, Kohn, and Sham proved in the 1960s that there is a functional (function of a function) by which energies and properties can be obtained from the electron density function of the system of interest.2 The search for this functional has given many extremely useful density functionals, but no single functional that is chemically accurate and robust for all types of atoms and bonds. “Chemical accuracy” has often been taken as ±1 kcal/mol, because quantities that are difficult to measure (bond dissociation energies, heats of reaction, activation

Accurate Polarizable Force Fields

While many force fields are available, they often sacrifice accuracy for generality. This deficiency could be overcome with more elaborate polarizable force fields or a set of force-fields for different purposes. We assume that even high accuracy DFT will always be too slow for billions of atoms problems that are at the heart of experimental chemistry. For example, even an invisible drop, like a micromole (∼10−7 g) of water contains 1018 atoms! Classical force fields will be with us for a long time!

Figure 2. Relative computer power (CP) until 2017 (actual) and after 2017 (imagined). Dashed line is expectation based on Moore’s Law.

Figure 3. Timeline for Holy Grails in methods development. 540

DOI: 10.1021/acs.accounts.6b00532 Acc. Chem. Res. 2017, 50, 539−543

Commentary

Accounts of Chemical Research A New Molecular Dynamics Sampling Method (“ω-Dynamics”)

number of ways that two molecules of one substance can interact with each other, all make this a very hard problem.

Amazing progress in molecular dynamics of large systems has been achieved, as represented by the methods for sampling and convergence such as replica exchange, λ-dynamics, and metadynamics and various types of accelerated and steered molecular dynamics. Current MD on proteins is often performed on millisecond time scales, perhaps the shortest time scale to achieve chemically meaningful results, but these calculations require special Markov state model methods, or special architecture computers such as D. E. Shaw’s Anton or very large GPU clusters. New MD techniques to speed MD sampling are Holy Grails of this field. We call this unknown method of MD, “ω-dynamics”.

Predicting Morphologies of Amorphous (Nonperiodic) Materials

Many commercial materials are based on noncrystalline materials, often inhomogeneous. This raises an even more difficult problem than the prediction of crystal structures, since regions of order and disorder, defects, and inhomogeneities all must be predicted. Accurate Predictions of Ligand−Protein Binding Energies

Host−guest association energies, drug−protein, protein− protein, and other interactions, are the source of organization that makes biology possible. While progress is continually being made on the protein-folding problem,7 reliably and routinely predicting ligand−protein association energies (±0.25 kcal/mol) is a goal that still has not been achieved.8

Conquering the Combinatorial Conundrum

There are many problems that are currently impossible to solve because of the sheer number of calculations that need to be performed due to the enormous combinatorial complexity of the problem. For example, large molecules have many rotatable bonds and enormous numbers of conformations. While the global minimum is often sought, the Boltzmann distribution of conformers is really more important to describe the true state of a conformationally complex system. Protein folding is such a problem, and it is further complicated by ions and solvation; millions of arrangements of a typical protein need to be considered. Computations of entropies of association, binding, and second order reaction rates in solution require computations involving millions or even more configurations, each with millions of atoms. Current methods can be extended to larger problems when sufficient computer speed and time are available, but new methods are needed to conquer this conundrum. New mathematics and machine learning algorithms may be able to overcome the combinatorial nature of statistical mechanical problems.

Reaction Design

Being able to conceive a useful reaction, and then to use computations to predict reagents (or catalysts, see below) to make that reaction happen, is perhaps the major goal of computational organic chemistry. One daunting challenge is that one must be able not only to predict that the desired reaction will happen but to predict that all conceivable competing reactions will not happen! Maeda and Morokuma9 have developed the artificial force induced reaction (AFIR) method and other quantum mechanical methods that have been shown to rediscover some rather complex reactions and even predict new mechanisms. Martinez has done high temperature MD,10 called a “nanoreactor”, to predict reactions. These methods are still much slower than experiment and not generally applicable to synthetic chemistry reaction conditions, much less a cellular environment. Machine learning may again provide the key to solving such a complex problem.

Methods To Predict Chemically Accurate Solvation Energies, Especially for Aqueous Solutions

Catalyst Design

While various polarizable continuum models such as CPCM (conductor-based polarizable continuum model) and SMD (solvation model based on density) are quite useful, errors of many kcal/mol are common.5 To explore reactions in solution, as experimentalists usually do, high accuracy solvation energies are needed. These may have to be built on explicit solvent models and molecular dynamics (see below). The discovery of new methods, and further advances of current methods, as summarized in Figure 3, will open up the potential for achieving the true Holy Grails of applied computational organic and biological chemistry.

Computational methods can be used to design catalysts for some relatively simple reactions.11 Some progress has been made in the prediction of solid-state catalysts.12 While this cannot yet be done routinely, discoveries to be made in the next few decades of the 21st century will make this possible. Enzymes are one type of catalyst for which progress in computational design has been demonstrated in our laboratories and in others. The use of computations to design an amino acid sequence plus cofactors to catalyze any desired reaction, followed by synthesis by natural organisms, is a Holy Grail that we have sought.13 Finding this Holy Grail will be the solution to many of the current problems of synthetic chemistry.



THE HOLY GRAILS OF COMPUTATIONAL ORGANIC CHEMISTRY AND BIOCHEMISTRY The Holy Grails of computational organic chemistry and biochemistry are shown in a conceptual image in Figure 4. Each of these targets can be achieved now for a few simple systems, but the real Holy Grails will be achieved only when these computational predictions and designs can be done routinely, efficiently, and with reliable chemical accuracy.

Materials and Device Design

Predictions of the structures and properties of unknown materials, solids, and other condensed matter systems, including interfaces and defects and all the deviations from ideality that make chemistry interesting in unpredictable ways, is a major challenge to computational chemistry. The Materials Genome of the U.S. directly addresses and supports financially the accelerated computational discovery of new materials and the interactions of experimental and computational groups.14 Even more challenging will be the design, through computation, of devices, such as computer chips, photovoltaics, and the myriad of devices that will be needed for clean energy, environmental remediation, transportation and medical procedures, and solution of other societal challenges.

Predicting Crystal Structures, Polymorphs, and Periodic Systems

There have been many attempts and some progress toward the prediction of crystal structures.6 The frequent existence of polymorphs, the necessity to compute periodic properties, the need for highly accurate intermolecular potentials, and the large 541

DOI: 10.1021/acs.accounts.6b00532 Acc. Chem. Res. 2017, 50, 539−543

Commentary

Accounts of Chemical Research

Figure 4. The Holy Grails of computational organic chemistry and biochemistry. quantum mechanics/molecular mechanics simulations with neural networks. J. Chem. Theory Comput. 2016, 12 (10), 4934−4946. (c) Wei, J. N.; Duvenaud, D.; Aspuru-Guzik, A. Neural Networks for the Prediction of Organic Chemistry Reactions. ACS Cent. Sci. 2016, 2 (10), 725−732. (5) (a) Takano, Y.; Houk, K. N. Benchmarking the conductor-like polarizable continuum model (CPCM) for aqueous free energies of neutral and ionic organic molecules. J. Chem. Theory Comput. 2005, 1, 70−77. (b) Skyner, R. E.; McDonagh, J. L.; Groom, C. R.; van Mourik, T.; Mitchell, J. B. O. A Review of Methods for the Calculations of Solution Free Energies and the Modeling of Systems in Solution. Phys. Chem. Chem. Phys. 2015, 17, 6174−6191. (6) (a) Organov, A. R. Modern Methods of Crystal Structure Prediction; Wiley-VCH: Berlin, 2010. (b) Beran, G. J. O.; Hartman, J. D.; Heit, Y. N. Predicting Molecular Crystal Properties from First Principles: FiniteTemperature Thermochemistry to NMR Crystallography. Acc. Chem. Res. 2016, 49 (11), 2501−2508. (7) The CASP Critical Assessment of Protein Structure “competition” has shown increasing success in protein structure prediction: http:// predictioncenter.org/. Accessed on January 7, 2017; for a recent combination of molecular computations with information about the evolution of protein structure, all combined with machine learning to predict folds of protein sequences not yet know experimentally, see: Ovchinnikov, S.; Park, H.; Varghese, N.; Huang, P.-S.; Pavlopoulos, G. A.; Kim, D. E.; Kamisetty, H.; Kyrpides, N. C.; Baker, D. Science 2017, 356, 294−298.

Computational organic and biological chemistry is coming into common use, and the achievement of these Holy Grails will forge an ever greater synergism between experiment and theory for the discovery of chemistry in the future.



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. ORCID

K. N. Houk: 0000-0002-8387-5261 Notes

The authors declare no competing financial interest.



REFERENCES

(1) Moore, G. E. Cramming more components onto integrated circuits. Electronics 1965, 38, 82−84. (2) (a) Hohenberg, P.; Kohn, W. Inhomogeneous electron gas. Phys. Rev. 1964, 136, B864−B871. (b) Kohn, W.; Sham, L. J. Self-consistent equations including exchange and correlation effects. Phys. Rev. 1965, 140, A1133−1138. (3) Car, R. Fixing Jacob’s Ladder. Nat. Chem. 2016, 8, 820−821. (4) (a) Behler, J.; Parrinello, M. Generalization neural-network representation of high-dimensional potential-energy surfaces. Phys. Rev. Lett. 2007, 98, 146401. (b) Shen, L.; Wu, J.; Yang, W. Multiscale 542

DOI: 10.1021/acs.accounts.6b00532 Acc. Chem. Res. 2017, 50, 539−543

Commentary

Accounts of Chemical Research (8) Stjernschantz, E.; Oostenbrink, C. Improved ligand-protein binding affinity predictions using multiple binding modes. Biophys. J. 2010, 98, 2682−2691 and references therein. (9) Maeda, S.; Harabuchi, Y.; Takagi, M.; Taketsugu, T.; Morokuma, K. Artificial force induced reaction (AFIR) method for exploring quantum chemical potential energy surfaces. Chem. Record 2016, 16, 2232−2248. (10) Wang, L.-P.; Titov, A.; McGibbon, R.; Liu, F.; Pande, V. S.; Martinez, T. J. Discovering chemistry with an ab initio nanoreactor. Nat. Chem. 2014, 6, 1044−1048. (11) Houk, K. N.; Cheong, P. H.-Y. Computational prediction of smallmolecule catalysts. Nature 2008, 455, 309−313. (12) Nørskov, J. K.; Bligaard, T.; Rossmeisl, J.; Christensen, C. H. Towards the computational design of solid catalysts. Nat. Chem. 2009, 1, 37−46. (13) Kiss, G.; Celebi-Olcum, N.; Moretti, R.; Baker, D.; Houk, K. N. Computational enzyme design. Angew. Chem., Int. Ed. 2013, 52, 5700− 5725. (14) “Materials Genome Initiative for Global Competitiveness,” Office of Science and Technology Policy, Executive Office of the President, Washington, DC, June 2011.

543

DOI: 10.1021/acs.accounts.6b00532 Acc. Chem. Res. 2017, 50, 539−543