How The Internet Changed Chemistry
The Internet and World Wide Web have ignited modern computational chemistry ELIZABETH K. WILSON, C&EN WEST COAST NEWS BUREAU
I
t was only 50 years ago, but it could have been hundreds. In the 1960s, academic computational chemists shared their computer programs via a Pony Express-type service run by scientists at Indiana University, Bloomington, called the Quantum Chemistry Program Exchange. Members learned about new software in circulated newsletters, and for a small fee, they could order the programs’ source codes, which were sent by mail on computer punch cards or magnetic tape. Doing the actual computational science was just as ponderous, recalls Henry Rzepa, a chemistry GLOSSARY professor at Impe◾ rial College London. Distributed As a graduate stucomputing: In dent at the Univerwhich numerous sity of Texas, Austin, individual in the 1970s, Rzepa computers perform spent days in a dedismall tasks and cated computation send their results center, wrestling to a main computer with punch cards. center. ◾ Grid computing: “It was a lot of tedious, repetitious In which work, punctuated infrastructures of by the occasional connected clusters discovery,” he says. of computers are Then came the made available 1980s. The growto groups, largely ing development of in academia and the Internet swiftly government. ◾ Cloud made such tortured, slow communicacomputing: In tion and scientific which banks progress a distant of connected memory. The seemcomputer servers ingly simple act of provide computing connecting computpower, supplied ers to one another by commercial completely transorganizations. formed the computational landscape, eventually leading to today’s ability to perform molecular calculations on demand, with almost limitless computing power. Thirty years ago, though, few laypeople had e-mail, let alone dial-up
modems. But that didn’t stop academic institutions from anticipating the massive scientific paradigm shift that was about to occur. In 1985, for example, a consortium of Dutch chemists formed the Dutch National Facility for Computer Assisted Organic Synthesis & Computer Assisted Molecular Modelling. The center developed ways to link together computers at distant facilities. After attending a 1987 conference in the Netherlands titled “Chemical Structures: The International Language of Chemistry,” attendees reported the
CEN.ACS.ORG
24
AUGUST 17, 2015
design of a user-friendly graphics menu interface that allowed “even the novice user direct access to the module(s) of his choice.” But it was the World Wide Web that really opened up the floodgates to progress in computational chemistry, Rzepa says. In 1994, Rzepa and his colleagues published a prescient paper in Chemical Communications, “Chemical Applications of the World-Wide-Web System” (DOI: 10.1039/ c39940001907). Suddenly, chemists could turn the scads of numbers—bond angles, dipole moments, and the like—they’d been using to represent molecules into two- and threedimensional pictures. Rzepa credits in particular the open-source molecular structure viewing program Jmol for harnessing the power of the Web, “showing how you could take computational chemistry soft-
SHUTTERSTOCK/C&EN
From Cards To Clouds
COURTESY OF NATALIE TATUM/NEWCASTLE UNIVERSITY
ware and a Web browser and convert it into rotating pictures.” The World Wide Web also allowed scientists to harness the power of personal computers that had entered homes en masse since the 1980s. Most computers spend a majority of their time sitting idly, their processors unused. Instead, scientists realized, these computers could be performing small tasks during their downtime, sending results to a central computing center. The collected results could then be used to solve big problems. This is a strategy now known as distributed computing. In 1999, scientists at the University of California, Berkeley, famously launched SETI@home, in which people volunteered to use their home computers to analyze radio telescope data for signs of intelligent life elsewhere in the universe. Around that time, Vijay Pande was just starting his career as an assistant chemistry professor at Stanford University. “I wanted to do something big,” he says. “The limiting factor in computation was the paucity of computer power.” Pande recognized the potential for distributed computing to solve complicated computational problems in chemistry. He developed methods to break up large calculations into many small ones to predict how a protein folds. His lab launched Folding@home in October 2000. Fifteen years later, Folding@home is still going strong, with more than 140,000 participants. It has been joined by numerous other distributed computing projects such as Rosetta@home, which predicts protein
CLOUD PREDICTION
This simulation, produced by a cloud-based program from the Cambridge Crystallographic Data Centre, shows how an antituberculosis drug (green) might dock into a transcriptional repressor protein (gold). The measured X-ray structure of the drug (gray) is shown for comparison.
structures, and climateprediction.net, which models climate change. Meanwhile, in the 1990s, academicians and governments began connecting large, geographically distant computer clusters, creating so-called grids. Grids could be used by many different groups and gave scientists unprecedented computing power without having to build their own supercomputing facilities. These grids’ more commercial cousin, what is now called “the cloud,” also makes use of large systems of linked computers. Unlike systems of linked supercomputers, which require time sharing, the cloud is a tremendous flexible resource, providing as much on-demand computing power for as long as it’s needed. Largely run by companies such as Amazon or Google, the cloud offers even less technological commitment on the part of a scientist. Pharmaceutical
companies have embraced the cloud, purchasing cloud computing time to search drug databases or to perform docking calculations on compound libraries. Today, most scientists, even academic researchers, agree the future of computational chemistry lies largely in the cloud. Paul Davie, who manages the Cambridge Crystallographic Data Centre’s site at Rutgers University, sees access to the cloud as a “game changer” for smaller biotech companies. In the cloud, these companies have at their hands a wealth of computing resources without having to invest in a large computer. “It’s like renting a good hotel room instead of buying a house,” he says. Initially, Davie says, pharmaceutical and biotech companies balked at the idea of the cloud, in part because of security concerns. Then, they realized that companies such as Amazon have invested a tremendous amount in security. “Their reputation depends on security resources,” Davie says. “I think that’s been accepted.” Of course the cloud can’t solve every chemical problem. Some types of problems, such as lengthy molecular dynamics simulations, will always require frequent communication among the speedy processors of supercomputers. Still, academic chemists, Pande says, are also realizing the benefits of the cloud’s instant availability for short-term projects. “Universities don’t build their own phone systems,” Pande observes, so “there’s no reason to put together their own [computer] clusters—especially when companies are doing it at extremely low cost.” ◾
Online volunteers help tackle big scientific challenges, by Alán Aspuru-Guzik When I was an undergraduate student in Mexico in the late 1990s, I was fascinated by the power of distributed computing projects, which harness many individual computers to carry out complex calculations and analysis. These projects are, in a way, the greenest form of computing. They use otherwise unused CPU (central processing unit) cycles from volunteer donors around the world. In particular, the SETI@home project at the University of California, Berkeley, launched in 1999, was an inspiration: Over the years, this search for extraterrestrial life in radio telescope signals has been powered by the idle CPU cycles of hundreds of thousands of volunteer machines around the world. When I became an assistant professor at Harvard University back in 2006, I was excited about the possibility of using distributed computing to run the theoretical calculations needed to discover novel materials. So in collaboration with the IBM World Community Grid, my CEN.ACS.ORG
group and I started the Harvard Clean Energy Project (CEP), an effort to find novel organic electronic materials capable of converting sunlight into energy. Having employed more than 35,000 CPU years of computer time, CEP is now the largest computational quantum chemistry project that’s been carried out to date. CEP and subsequent projects in my group have taught us how to more efficiently design materials. With our experimental collaborators, we have discovered new types of organic molecules for flow batteries and organic light-emitting diodes. One of the most satisfying aspects of the project over the years has been the interaction with the project participants in the online forums. Seeing the enthusiasm for scientific discovery among the citizens of the world makes me optimistic that the Internet will continue delivering revolutionary tools that can help us tackle the scientific challenges associated with the 21st century. Alán Aspuru-Guzik is a professor of chemistry at Harvard University.
25
AUGUST 17, 2015