Anal. Chem. 2009, 81, 1736–1740
DNA sequencers: the next generation Rajendrani Mukhopadhyay The industry is striving to reach the frontier of affordable genomic sequencing. Companies and researchers involved in DNA sequencing are on a mission: to boldly go where no scientists have gone before by making the sequencing of genomes fast and cheap. The industry has made such progress that some experts say it even beats out the silicon chip industry’s rate of advancement. Electronics density and computing power in the silicon chip industry have doubled every 2 years for the past 40 years. The cost-performance of the sequencing industry followed a similar trend from 1970 to 2004. But now, with the advent of next-generation sequencers, says George Church of Harvard University, the cost-performance of the sequencing industry has been doubling every 4 months for the past 4 years. At the end of 2008, Science magazine highlighted nextgeneration sequencing’s significant achievements. Cited as examples of the technology’s growing power were the sequencing of the genomes of an extinct cave bear, a woolly mammoth, several people of different ethnic origins, and a cancer patient over the course of the year (Science 2008, 322, 1768). “We used to talk about sequencing genes. Now it’s almost second nature to talk about sequencing genomes. Isn’t that fantastic?” exclaims Jeffery Schloss at the National Human Genome Research Institute (NHGRI) of the National Institutes of Health. Companies see significant dollar signs in the business. If the price of genomic information can be reduced to the point where certain enterprises, such as pharmaceuticals, agriculture, medicine, and biomedical research, find it affordable, sales of sequencing instruments are expected to shoot up. The next-generation sequencing field isn’t without its detractors who question whether there is any value in sequencing genomes when it really isn’t obvious how the information will benefit humanity. The criticism exasperates the experts. They are fully aware that the current handful of genomes doesn’t make sequencing’s value obvious. But it’s for this very reason, they say, that more genomes need to be sequencedsto get to that critical point where genomics can be properly exploited. “It’s like going up to Alexander Graham Bell after he built the first telephone and 1736
Analytical Chemistry, Vol. 81, No. 5, March 1, 2009
saying, ‘What is it good for? I can’t talk to anybody with it because nobody else has a phone!’ It’s the same thing here,” asserts Stephen Turner of Pacific Biosciences. “Not until we have a large number of fully sequenced genomes will we be able to get the full value from them. People who are critical of the endeavor and say that there is no value in the genome right now are failing to realize that in order to get the value of the genome, first you have to have the genomes.” Table 1 lists some companies that have or anticipate having next-generation DNA sequencers on the market. The list is not exhaustive. WHAT’S IN A NAME? Over the years, the term “next-generation” has been bandied about a lot. “I looked back at some old slides. When the [Applied Biosystems] 3730 came out, some people were labeling that the ‘next generation’,” chuckles Schloss. (The 3730 was launched in 2002.) So, says Schloss, the definition of the term “is in the eye of the beholder.” Everybody agrees on the broadest definition of the term: nextgeneration sequencers fundamentally differ in terms of technology, 10.1021/ac802712u 2009 American Chemical Society Published on Web 02/04/2009
Table 1. Selected vendors of next-generation DNA sequencers1
1
Readers should contact vendors directly for more information about products and services.
throughput, and cost from the semiautomated CE-based instruments used for the Human Genome Project. Beyond that definition, the meaning of next-generation becomes blurry. Some people apply the term to the early commercial instruments that were launched by 454 Life Sciences, Illumina, and Applied Biosystems in 2005 and 2006. They call instruments that are about to be launched by companies like Oxford Nanopore Technologies and Pacific Biosciences “next-next-generation”. Besides being unwieldy, this definition omits companies like Helicos Biosciences and Complete Genomics who came along
after 454, Illumina, and Applied Biosystems but are more mature than Pacific Biosciences and Oxford Nanopore Technologies. Other people use next-generation to mean massively parallel sequencing on arrays and next-next-generation to mean sequencing of single DNA molecules. But again, a company like Helicos gets lost in the shuffle because its technology is designed for massively parallel sequencing of single molecules. And still a third group of people prefer to number the generations. The first generation refers to the CE instruments. (This characterization may ruffle some feathers because it Analytical Chemistry, Vol. 81, No. 5, March 1, 2009
1737
completely overlooks the years of development and iterations that the CE instruments went through before maturing.) The second generation consists of the first wave of instruments that implemented massively parallel sequencing. The third generation consists of the instruments that are on the brink of launch, like the one Pacific Biosciences plans to have by 2010. Oxford Nanopore Technologies refuses even to categorize its forthcoming machine within any of the current definitions. Because its technology is likely to be the first label-free sequencer, Zoe McDougall, the company’s representative, says that the company likes to think of its product as “a new generation.” In early January, an announcement came out that Illumina has bought exclusive rights to market, sell, and distribute products by Oxford Nanopore Technologies (Nature 2009, DOI 10.1038/ 457248a). For the purpose of this article, next-generation will encompass all technologies, current and future, that tackle sequencing without Sanger biochemistry and capillaries. THE RAPIDLY SHRINKING COST OF SEQUENCING The cost of the Human Genome Project is vigorously debatedsa wide range of numbers are thrown around. Some quote the price to be billions of dollars. Others, like Jay Shendure at the University of Washington, say the billion-dollar price tag is unfair, simply because so much groundworkswhich shouldn’t be included in the project’s final costshad to be laid first. Shendure compares the Human Genome Project to the space program. “You had to figure out how to fly first before you could go to the Moon,” he says. As in the space program, a lot of R&D had to be invested before human genome sequencing could become viable. A fairer price quote, according to Shendure and others, would be to consider how much it would have cost to sequence a human genome when all the necessary technologies were in place. That number is >$10 million. Even in the midst of the Human Genome Project, some researchers were starting to consider how to make sequencing more affordable. “There was a relatively small group of people who were pushing for more investment in technology to bring down the overall cost of the project,” recounts Shendure. “But after a certain point, around the time of Celera, and when the competition heated up, they just committed to a technology and scaled up.” Celera is the company, headed at the time by founder J. Craig Venter, that privately sequenced the human genome in direct competition with the U.S. government’s efforts. Once the Human Genome Project was completed, researchers could turn their attention to bringing costs down in earnest. To achieve this goal, researchers moved away from the CE-andSanger-biochemistry combo and focused on completely different technologies. Many of the ideasssuch as cyclic interrogation of DNA and nanopore sequencingshave been around since the 1980s, says Shendure. But the tools needed to bring those ideas to fruition weren’t available in the 1980s, and Sanger sequencing caught on and dominated all the way through the Human Genome Project. “The collective memory forgot that there may be other ways to sequence DNA,” says Shendure. However, in 2003, a confluence of factors spurred companies to begin investing in the alternative sequencing technologies. Once the human genome was completed, it provided a reference for all 1738
Analytical Chemistry, Vol. 81, No. 5, March 1, 2009
human sequencing studies, and development of sensitive CCD cameras and growth in computing speed made many of the new ideas more feasible. The next-generation technologies finally found themselves on solid ground. In 2004, NHGRI announced it would fund grants for the development of technologies that could sequence a human genome for $100,000 (with a target date of 2009) and $1000 (aimed for completion in 2014). Several grants that were funded for the $100,000 genome have recently come to fruition: their technologies are now commercialized or will be soon. The field’s rapidly growing companies now jostle each other for a market that’s estimated to be worth $1-2 billion a year. WHAT ARE THE DIFFERENT TECHNOLOGIES? A quick glance at Table 1 makes it obvious that myriad technologies are being used in the quest for faster and cheaper sequencing (for a detailed review, see Nat. Biotechnol. 2008, 26, 1135-1145). In 2005, Church’s group described the Polonator technology (Science 2005, 309, 1728-1732), and 454 launched the first commercial next-generation DNA sequencer (Nature 2005, 437, 376-380). Illumina and Applied Biosystems soon followed. Illumina, 454, Helicos, and Applied Biosystems sell products that are variations of the cyclic array sequencing conceptsthat is, sequencing of a dense array of DNA-based features through iterations of enzymatic reactions and imaging. Pacific Biosciences is taking advantage of zero-mode waveguides to watch DNA polymerase work in real time. Oxford Nanopore Technologies, a company spun out of the work done in Hagan Bayley’s laboratory at the University of Oxford (U.K.), is developing an array of enzyme-nanopore complexes capable of reading single strands of DNA without reagents. Complete Genomics made a splash in October 2008 when it announced that beginning in the second quarter of 2009, it will offer a genome-sequencing service for only $5000. Its platform is based on submicrometer arrays populated with billions of DNA “nanoballs” that are prepared in solution by making hundreds of copies of circular DNA. The DNA features are read by a process called “combinatorial probe–anchor ligation”, which the company claims reduces reagent consumption and imaging time. No one yet knows which instrument will come to dominate the fieldsor even if a single technology will win out. The current thinking is that the various technologies, with their different strengths and weaknesses, will fill niches and focus on targeted applications. For example, experts envision applications that go beyond plain-vanilla de novo sequencing. Genomics can be tied in to lines of research on, for example, chromatin structure, epigenomics, RNA transcription, immunology, and genomes of microbial communities. And for these applications, some experts argue, a whole genome sequence may be unnecessarysjust the sequence of a targeted region will be sufficient for clinicians and researchers to act on. The new sequencers will be able to produce that type of data rapidly and cheaply. Richard Selden of Network Biosystems, which focuses on this targeted approach, acknowledges that the sequencing of whole genomes has merit, “but we think the world has more than enough companies in this area. The approach we’ve taken, instead of trying to get hundreds of millions of base pairs in a run, is to ask the question: ‘Where in society, both in clinical medicine and
other fields, would a relatively small amount of sequence information be valuable if it were available in real time?’” The company is working on a microfluidic device that takes cells from a sample, lyses them, carries out Sanger sequencing on the DNA, and then analyzes the sequence in real time. The goal is “to take a clinical sample and, in the course of an hour or less from the introduction of the sample, generate [a] DNA sequence that would allow a clinician to decide how to treat a patient,” explains Selden. THE FINE PRINT To compare next-generation technology with CE-based Sanger sequencing, experts look at the accuracy, throughput, and read length of the analyses. Sanger sequencing can now achieve read lengths of ∼1000 bp, and the per-base accuracy can be as high as 99.999%. For high-throughput shotgun genomic sequencing, the Sanger method costs 50¢/kb. Except for cost, most experts agree that the next-generation sequencers on the market haven’t yet reached Sanger sequencing standards. But the technologies are relatively young, and the companies are aggressively pushing to meet the criteria. Experts anticipate that future platforms will be able to reach some of the Sanger sequencing standards. Other parameters are harder to express in numbers but are major factors in the overall operation of the instruments. For instance, “library-making is still an art for most of them. You have to have really good people to do it,” points out Schloss. The machines have to run for about a week to sequence the DNA library. After that comes “the data analysis, which tends to drag on after the run is done,” says Harold Swerdlow at the Wellcome Trust Sanger Institute (U.K.). “People only quote the machine run time, but they don’t quote the analysis time or the sample prep time. Both of those tend to be quite complex and tricky for the average user to get their head around.” Data analysis can be a headache because these instruments churn out huge amounts of data in every run. “In a perfect world, analysis would be a day or two, but it can drag on if something goes wrong,” says Swerdlow. “It’s not uncommon for things to crash because there is so much data. You find out [about the crash] the next day, and you start it off again. ... It could take up to a week for the data analysis to come through.” To illustrate how much more data the non-CE sequencers churn out, Timothy Burcham of Applied Biosystems compares two of the company’s products: its CE sequencer, the 3730XL, and its next-generation sequencer, the SOLiD. “The Applied Biosystems 3730XL is a 96-capillary machine, and if you ran it for maximum throughput, you would obtain about 3 megabases a day,” he says. “One SOLiD box consistently yields over 3 gigabases a day.” But, because the cost of the next-generation analysis is lower than that of Sanger sequencing, researchers are willing to put up with the inconveniences. “One run may only cost $5000 in reagents, and the library prep is about $1000,” says Schloss. “Ultimately, your cost per base pairsand I’m even talking about per analyzed base pairssis still much lower.” CALCULATING A GENOME’S COST What do people mean when they say they can sequence a genome for $250,000, $100,000, or even $5000? “The problem is [that] there
isn’t a standard definition,” says Swerdlow, “but what they don’t mean, generally, is 3 billion bases of information,” which is the approximate size of the human genome. When it comes to genome sequencing, the DNA is sequenced over and over again to ensure that the bases are all accounted for and reported correctly. Experts talk about coveragesanywhere from 10 to 40×sto describe how thoroughly the genome has been analyzed. Of course, this thoroughness plays into the cost of sequencing the genome. And then there are costs associated with the instruments, their maintenance, personnel to run the machines, reagents, and so on. But cherry-picking of these factors does occasionally occur, and every expense may not be wrapped into the quoted cost of a sequencing project. Applied Biosystems, for example, stated in March 2008 that the cost of sequencing a genome using its SOLiD instrument was $60,000. That number includes the commercial price for all the reagents needed for a 12× coverage of the human genome but doesn’t factor in the equipment, personnel, or overhead. When Complete Genomics announced that it would offer a sequencing service for $5000 per genome, experts sat up and took notice. Some dismiss the claim as hype, while others think that the company may be on to something. The company says that the $5000 genome will provide 120 gigabases of mapped reads, which is 40× haploid coverage. But as Church, one of the originators of the $1000-genome idea, points out, “the prices or sequence qualities haven’t been subjected to peer review. Indeed, it is, unfortunately, rare for even technology journals to encourage authors to document their costs. But the Complete Genomics claims seem consistent with experiments in my lab with the Polonator technology.” Schloss says that from NHGRI’s point of view, “our bottom line is dollars in, bases out.” The cost of a genome sequence “has to include all of the materials, labor, equipment amortization, rent, and electricity,” he says. “That’s what we mean by total cost.” NHGRI has specified to its technology development grantees that the sequence quality of their genomes has to be at least as good as the high-quality draft mouse genome that was published in Nature (2002, 420, 520-562). WHO’S USING THE NEXT GENERATION? Genome-sequencing centers, like the Broad Institute and the Wellcome Trust Sanger Institute, were early adopters of nextgeneration sequencing technologies. But experts see a shift in the customer base as the technologies become more user-friendly and established. Now, instead of the genome centers buying 8 or 10 instruments at a time, “there has been a definitive trend towards smaller labs buying 1 or 2 instruments,” says Swerdlow. “That’s partly because the technology has matured and it’s accessible. Before, you had to have a team of 10 trained bioinformatics guys, a bunch of molecular biologists, a lot of infrastructure, and big computers to support it all. Now you can get away with a reasonably small group of people running a machine or two.” The companies that produce next-generation sequencers are banking on making sequencing so affordable that researchers in biomedicine and pharmaceuticals won’t have to think twice about sequencing genomes as part of their experiments. “At $100,000 a genome, there are only a few academic studies that can be done. You don’t bring in the drug companies, and you don’t bring in the clinical applications,” says Swerdlow. However, he says that Analytical Chemistry, Vol. 81, No. 5, March 1, 2009
1739
at $5000 per genome, pharmaceutical companies start getting interested in pharmacogenomics and general large-scale studies. At that price, 1000 patients and 1000 controls can be sequenced for $10 million, a number within the reach of a drug company. But the brass ring is the $1000 genome because that’s when a pharmaceutical company can get genomic data from even larger cohorts. For individualized medicine, in which doctors would sequence everybody’s genome as a matter of course, some experts say several things have to happen. The technology has to be the pushbutton-and-go kind with minimal, easy sample prep. The cost, more importantly, has to be in the $100-per-genome ballpark. That’s when it makes sense for every individual to have his or her genome sequenced. (The ethics of individualized genome sequencing are a tangled mess that is beyond the scope of this article.) THIS IS WHERE THE PARTY’S AT Experts agree it’s an exciting time for genomic sequencing, especially because it’s unclear how the field will unfold. Will the $1000 genome be achieved? Can genome sequencing establish
1740
Analytical Chemistry, Vol. 81, No. 5, March 1, 2009
itself and have a huge impact on the clinical market? Are there other feasible uses for genome sequencing? All these questions are open-ended, but one thing is certain in the minds of the experts: next-generation genome sequencing is not a bubble that will pop and disappear. The technologies for achieving faster and cheaper sequencing are here. Not all of them will survive. But the ones that do will play into some aspect of science, medicine, and society. So, experts are ignoring the critics and naysayers. Genome sequencing will become part of daily life, they say, and the growth of the industry is an indication of its promise. “The people who don’t understand it can be excused because there are very few industries where you get 4 logs [of improvement] in 4 years,” says Church. “Not even the computer industry is like thatsthe computer industry is breathless at a factor of 2 every 2 years, but that’s Poky by comparison.” But the facts speak for themselves, and Church says the naysayers “are either in denial, or they haven’t heard the news yet.” Rajendrani Mukhopadhyay is a senior associate editor in the C&EN Journal News and Community Department.
AC802712U