Partnership applies deep learning to very big data - C&EN Global

Partnership applies deep learning to very big data ... hoping that a nascent approach of applying supercomputing to huge data sets will allow it to ...
1 downloads 0 Views 957KB Size
ATOM teammates (from left) CaladThomson, Brase, Paragas, and Baldoni stand among new supercomputing assets at Lawrence Livermore National Laboratories.

PHARMACEUTICALS

Partnership applies deep learning to very big data GlaxoSmithKline teams with government labs on computational models for drug discovery RICK MULLIN, C&EN NEW YORK CITY

CREDIT: JULIE RUSSELL/LAWRENCE LIVERMORE NATIONAL LABORATORIES

G

laxoSmithKline has for several years engaged in a campaign of creative destruction, acting to replace traditional methods of manufacturing and research with more efficient ways of doing things. The implementation of programs such as its Manufacturing Technology Roadmap, a manifesto against the standard pharma method of manufacturing drugs in batches, for example, has helped earn the company the reputation of risk-taker among major drug firms. Now, GSK is turning its attention to drug discovery, where it is hoping that a nascent approach of applying supercomputing to huge data sets will allow it to move from target identification to a molecule ready for the clinic in just one year. John Baldoni, senior vice president of platform technology and science at GSK, refers to the drastic telescoping of a process that has traditionally taken closer to a decade as the company’s “moon shot” project. It will rely heavily, he says, on the replacement of the iterative chemistry pro-

cess of identifying and testing molecules, with a networking approach that finds and tests huge numbers of molecules simultaneously, using gigantic stores of data. The technique GSK is exploring, deep learning, exists at the cutting edge of artificial intelligence research in which computers are developing data models that evolve, or “learn,” through computational experience (see page 29). Facets of the new technique have already been employed elsewhere in industrial research, such as in the development of self-driving cars. But it has not, according to Baldoni, been explored by any major drug company in a large-scale effort to manage the explosion of data in drug research that followed the decoding of the genome and the rise of cloud computing. “Having spent a year looking at statistics around the drug discovery process,” Baldoni says, “we began asking whether it’s time to rethink how we do things.” Early-stage discovery methods developed with the rise of compound libraries and high-throughput screening in the early 2000s are time-con-

suming, predicated on huge investments, and currently standard in the industry, he says. Meanwhile, the process of drug discovery was bogging down in data. “A few people in my department had been talking to folks at the Department of Energy about how we might be able to take advantage of high-performance computing to replace some of the empirical work that we do in the drug discovery process,” Baldoni says. Those discussions led to a partnership between GSK, DOE labs headed by Lawrence Livermore National Laboratories (LLNL), and the National Cancer Institute (NCI). The partnership, called Accelerating Therapies for Opportunities in Medicine (ATOM), launched earlier this month, aims to develop computing models that will guide researchers based on a computer’s ability to quickly vet millions of molecules for efficacy and structural relationships, models that will adapt as they are applied to new data. ATOM will begin by putting DOE’s supercomputing powerhouse to work on data from GSK and NCI, taking advantage of the drug company’s expertise in chemistry and biology as a framework in pioneering applications of deep learning for drug research. The partners, having collaborated on defining ATOM’s mission over the past year, are currently deciding where to locate JANUARY 23, 2017 | CEN.ACS.ORG | C&EN

31

gcande.org

Take your ideas to the next level at the

21st Annual Green Chemistry & Engineering Conference

Making Our Way to a Sustainable Tomorrow RESTON, VA JUNE 13  15, 2017

Call for Papers is Now Open SUBMIT AN ABSTRACT AT GCANDE.ORG/PROGRAM BY FEBRUARY 13, 2017

32

C&EN | CEN.ACS.ORG | JANUARY 23, 2017

their central laboratory. The group is also But that isn’t enough. The partnership, acting to expand its membership. Baldoni Baldoni says, will need to recruit other large says it seeks to recruit other major drug pharmaceutical companies willing to pony companies willing to contribute significant up comparable stores of data. He adds that quantities of research data to its computer information on failed compounds will be modeling program. key to achieving breakthroughs in drug “What was really driving the conversadiscovery. tion was changes in supercomputing,” says The problem is that big drug companies Jason Paragas, director of innovation at are not forthcoming with the data or any LLNL, who was involved in early meetings information on failed discovery projects. with GSK. The advent of cloud computing, “I am involved in an initiative to get comin which companies such as Amazon and panies to share their data, and I have to tell Google were hosting huge volumes of data you it’s extremely frustrating,” he says. from many sources, and advances in super“There are hundreds of thousands of computing set the table for researchers to failed molecules or molecules no longer of apply computational learning systems to interest with information on structures, the highly complex field of drug discovery, analogs, toxicity, and structure-function drawing from heretofore unapproachable relationships,” Baldoni says. “Why would banks of data. you not want to put them into the greater Paragas says new supercomputers that good? We feel there is an obligation that will be used by ATOM have been purchased we have to patients in trials to share these by a consortium of failed compounds so we national research labs, can develop better drugs including Oak Ridge and faster.” Argonne, both of which But the partnership will provide systems is hopeful that industry and research backup to will come around and the partnership. contribute data. BaldoJim Brase, associate ni says the group is in director of computation discussion with various at LLNL, says ATOM research entities and will be taking a deep hopes to announce the dive into big data. “We addition of another have large amounts large drug company in of experimental the coming weeks. data—genomic data, There has been transcriptome data, little pushback on the assay data—on how —John Baldoni, senior vice changes to chemistry biological systems represident of platform technology in research at GSK as spond to chemicals and and science, GlaxoSmithKline the result of bringing in their structures. We are heavy computational seeking to understand from large data sets firepower in early discovery. On the conwhat particular combinations of things and trary, says Stacie Calad-Thomson, a GSK standards in those data sets are important chemist who is coordinating the activities to building predictive models.” of ATOM laboratories and key liaison with Brase says the variety of deep learning LLNL and NCI, there is a great deal of enATOM might be most interested in is unthusiasm in the lab. structured or unsupervised feature learning, “I think it will have incredible impact where the focus is on early-stage identificaon chemistry, allowing us to do research in tion of data sets that go together and signifa more rapid and agile fashion,” she says. icant patterns without predetermined pa“Everyone is very excited about that.”Carameters or expectations. The work is comlad-Thomson also notes that the company parable to what Google has accomplished has already seen the benefit accrued in tearwith face recognition, he says, but the data ing up traditional procedures in manufacsets are much larger and far more complex. turing. “Now we’re doing it in the discovery Baldoni says GSK has agreed to contribspace at the forefront of innovation.” ute information on 500 failed compounds, Baldoni agrees, adding that the company including complete toxicology and clinneeds to prepare for what’s coming. “There ical testing data, in addition to 600,000 is a recognition that the state-of-the-art advanced compounds in screening at the computers in the national laboratories will company. In all, GSK will give ATOM acsoon be at pharma companies. We might cess to more than a million compounds as well start building the tools to use them. screened over the past 15 years, all of which We’ll get ahead by a few years, and that is have biological data associated with them. critical.” ◾

“I am involved in an initiative to get companies to share their data, and I have to tell you it’s extremely frustrating.”