product review
Chemical informatics Specialized informatics tools help researchers mine chemical databases for drug discovery and other applications. James P. Smith and Vicki Hinson-Smith
T
oday, the word “informatics” is found liberally peppered throughout grant proposals, curricula, and conference proceedings. Informatics is concerned with gathering, organizing, storing, classifying, searching, and retrieving recorded information. Not long ago, the term was limited mainly to libraries and computer centers, but these days it refers to a widespread set of crucial research and data management tools. Perhaps most familiar is bioinformatics, which refers to the use of computer technology in biological applications other than ordinary data handling. The analogous use of computers to handle immense quantities of chemical information has evolved into a blend of chemistry and computer technology that parallels bioinformatics. This new breed has several aliases: chemoinformatics, chemiinformatics, cheminformatics, and chemical informatics. Essentially no distinction exists between the first three terms; they complement bioinformatics in the drug discovery process. However, the term “chemical informatics” is broader than the others and refers to informatics for all types of chemical applications. Gary Wiggins, director of the Indiana University chemical informatics program, sees similarities between the emergence of chemical informatics and the development of analytical chemistry. “Thirty years ago, analytical chemistry was regarded as the handmaiden of the sciences. Chemists didn’t see it as a separate science. No one will argue that point anymore, because it is now a discipline in its own right.” Many chemists were able to use analytical techniques, Wiggins says; however, the few who substantially advanced the field and elevated it to a bona fide branch of learning had strong interdisciplinary abilities that they brought to bear on innovation and problem solving. “Likewise, in the chemical informatics field, people need backgrounds in chemistry, mathematics, and computer science and the knowledge to bring these together, particularly in the area of large data sets.” Table 1 lists selected companies that provide comprehensive chemical informatics management systems. This table is intended to highlight chemical informatics, so major companies that specialize in bioinformatics systems are omitted. This list is not comprehensive, but it is representative of the diverse nature of commercial chemical informatics companies. © 2005 AMERICAN CHEMICAL SOCIETY
Bio- vs cheminformatics Bioinformatics took off in the era of genomics. The vast amounts of data generated by the Human Genome Project drove the development of more powerful databases and new algorithms for tasks such as finding patterns in gene expression and comparing the amino acid sequences of proteins. Bioinformatics is frequently used to mine databases for drug discovery. “Bioinformatics deals with large molecules [e.g., DNA or proteins], while cheminformatics deals with smaller molecules,” says Herbert Thiele, director of bioinformatics at Bruker Daltonik GmbH. A common drug-discovery strategy is “docking”, in which a small molecule attaches to a specific site on a target protein. The small molecule is designed to recognize the site, dock there, and modify the function of the protein. Numerous potential molecules are selected for evaluation on the basis of structure, shape, availability, synthetic routes, and predicted physical and chemical properties: This is the realm of cheminformatics. “The pharmaceutical industry generates massive amounts of data, especially in the forms of combinatorial libraries and highthroughput screening,” says Wiggins. “And the people on the chemical informatics side are coming up with the software and visualization techniques to maximize the information extracted from these data.” Informatics techniques are also applied to many areas of chemistry other than pharmaceutical discovery. J A N U A R Y 1 , 2 0 0 5 / A N A LY T I C A L C H E M I S T R Y
37 A
product review
Table 1. Selected chemical informatics databases, application software, and integrated content.1 Company
Product description
Additional information
Accelrys, Inc. 9685 Scranton Rd. San Diego, CA 92121-3752 858-799-5000 www.accelrys.com
The Accelrys Discovery Studio (DS) is designed as a fully integrated research platform. The DS Project Knowledge Manager is an Oracle-based groupware infrastructure that captures and stores the data generated by DS applications.
Chemical informatics products include desktop productivity tools, chemistry databases and data content, chemistry workflow tools, and specialist applications for screening and mining informatics, modeling, and simulation solutions; all are accessible on standard PCs.
Advanced Chemistry Development, Inc. 90 Adelaide St. West, Ste. 600 Toronto, Ontario M5H 3V9, Canada 800-304-3988 www.acdlabs.com
ChemAnalytics consists of a universal analytical data management system that captures, organizes, retrieves, shares, and reports spectra, structures, chromatograms, physicochemical properties, and other experimentally relevant information.
NMR spectral software performs processing, assignment, and database compilation and searching. The system provides web-based management of research samples and data as well as online property prediction.
Bio-Rad Laboratories, Inc. 1000 Alfred Nobel Dr. Hercules, CA 94547 510-724-7000 www.bio-rad.com
The KnowItAll Informatics System has a fully integrated environment with software and database solutions for spectroscopy (MS, NMR, IR, Raman, and spectral data management), cheminformatics, and ADME/toxicity prediction.
Internet-based searching saves both the end user and the IT department time. Bio-Rad offers cost-effective site licensing and maintains the servers that house the databases, cutting the internal costs of hosting the data in-house.
CambridgeSoft Corp. 100 Cambridge Park Dr. Cambridge, MA 02140 617-588-9100 www.cambridgesoft.com
ChemOffice is a chemical publishing, modeling, and database workstation package consisting of ChemDraw, Chem3D, E-Notebook, ChemFinder, ChemInfo, and BioAssay.
This is a comprehensive software system for computerassisted drug discovery. Organic chemical structures and flexible architecture provide a foundation for molecular modeling and computational chemistry.
ChemNavigator 6166 Nancy Ridge Dr. San Diego, CA 92121 877-477-5720 www.chemnavigator.com
The iResearch System offers fast and comprehensive chemical structure searches and comparisons. It provides a full range of cheminformatics Internet services to the life sciences community.
Instead of using a text-based retrieval system, cheminformatics uses chemical structures that researchers provide as input to identify similar compounds that might be screened for biological activity.
Daylight Chemical Information Systems, Inc. 27401 Los Altos, Ste. 360 Mission Viejo, CA 92691 949-367-9990 www.daylight.com
Daylight Toolkit enables companies to add cheminformatics capabilities to their environments. These tools easily assemble customized systems for total control over corporate chemistry.
DayCart, Daylight’s chemistry cartridge for Oracle, recently ushered in a new era of chemically intelligent enterprise. DayCart runs compound registration systems, combinatorial chemistry systems, and e-lab notebooks.
MDL Information Systems, Inc. 14600 Catalina St. San Leandro, CA 94577 510-895-1313 www.mdl.com
The MDL Chemscape and MDL Chime Pro solutions are industry standards for communicating chemical information on the web. CrossFire Beilstein and Gmelin databases are also provided.
These products read interactive chemistry documents, perform chemical database searches, and reach out to the wealth of chemical and biological information available in web environments.
Tripos, Inc. 1699 South Hanley Rd. St. Louis, MO 63144-2913 800-323-2960 www.tripos.com
SYBYL/Base, the heart of Tripos’s Discovery Software, can search molecular structure and properties. UNITY is a search and analysis system that explores chemical databases and features rapid, flexible 3-D searching.
The software combines database searching with molecular design and analysis to provide an integrated environment for new compound discovery.
ADME: absorption, distribution, metabolism, and excretion. 1 Some companies offer multiple products. Contact the vendors for details.
Computational chemistry vs chemical informatics Computational chemistry software is available for predictive models, visualization, computations, quantum chemistry, simple searches, molecular models, and so on, but these tools are not classified as chemical informatics. For instance, although the CRC Handbook of Chemistry and Physics contains several searchable data collections, the Handbook alone is not an example of chemical informatics. Informatics involves the maintenance and integration of huge amounts of disparate data for storage, problem solving, visualizing structure, and finding relationships. A chemical informatics system generally consists of data collections, data-handling software, laboratory information systems (LIMS), and a database management system. 38 A
A N A LY T I C A L C H E M I S T R Y / J A N U A R Y 1 , 2 0 0 5
“Computational chemistry is vastly different from cheminformatics—in its market, its needs, its reliability, and its user base,” says Osman Güner, executive director of cheminformatics at Accelrys. “Cheminformatics is primarily involved with the management of data and information. Much of the software and facilities used for cheminformatics are the same [as those] used for a company’s corporate database, which contains the ‘crown jewels’ of the company.” Thus, the standard for the informatics system is zero faults—much more stringent than that required for a computational chemistry tool.
Databases and data handling Databases, which organize data for distribution and use, are at
product review
the heart of an informatics system. Private databases store proprietary information gleaned from research or supplied by collaborators. Many academic and governmental organizations maintain free, public databases and computational servers. Commercial database providers sell access to biological and chemical data applicable to computational biology, drug discovery, or chemical analysis. Software that links information from various databases is called “middleware”. The term also refers to an evolving layer of services that links traditional applications and allows researchers to transparently use and share distributed resources, such as computers, data, networks, and instruments. Wiggins says that the culture of information use in chemistry circles is quite a bit different from that in medical and biological circles. Biological communities tend to favor free, public databases, whereas on the chemistry side, the collection of chemical information has always been a commercial enterprise. “Chemical Abstracts [which is owned by the American Chemical Society] forms a core of the chemical literature and provides a lot of very sophisticated data-handling tools that chemists can use directly at the workbench—SciFinder and others,” he says. These tools are not in the public domain. They are protected by copyright, and few competing middleware products are available for interrogation and data mining of Chemical Abstracts.
Representative companies Over the years, most cheminformatics companies have developed specialized niches within the marketplace. Market analysts estimate that 45% of informatics development revenue will be related to Internet applications, so many companies have based their offerings on Internet access; others rely on software and data distributed on CDs. Some have strong graphics and molecular structure components, whereas others have purchased and integrated numerous databases into their informatics systems. MDL Information Systems, which is now part of Elsevier MDL, has concentrated on web-based chemistry. For years, MDL had a lock on the chemical informatics side of the pharmaceutical business, providing companies with the tools to set up their own internal databases. Elsevier MDL provides the CrossFire Beilstein and Gmelin databases on the web along with a host of data-handling tools. The company also offers the MDL Patent Chemistry Database. Accelrys, a newcomer to the field, started buying smaller information providers and integrating the databases several years ago. Today, its library of products probably rivals MDL’s. Accelrys provides a fully integrated research platform that may contain several of its databases—such as chemicals available for purchase (CAP), chemical reactions, commercially available chemicals, failed reactions, physical properties, and spectra— selected by the client. The CAP database’s search technologies allow researchers to retrieve commercially available chemicals through exact structure, substructure, and similarity searching. Accelrys sells databases on CDs through direct sales channels. Daylight Chemical Information Systems developed systems that can easily handle chemical structure—a key advance in chemical informatics. Chemical structures can be used as search parameters. Searches can include direct pattern match-
ing and similarity searching, which finds resemblances in the properties of other molecules or selected functional groups. Tripos’s UNITY allows a researcher to build structural queries based on molecules, molecular fragments, pharmacophore models, or receptor sites. Conventional 3-D database searching finds only molecules whose stored conformations match the constraints of the query. But UNITY’s conformationally flexible 3-D searching finds molecules that can achieve a matching conformation regardless of the stored conformation. ACD/Labs’ ChemAnalytics captures, organizes, retrieves, shares, and reports spectra, structures, chromatograms, physicochemical properties, and other experimentally relevant information. The ability to predict chemical properties and spectra based on an input structure is an important innovation in informatics. ChemAnalytics provides web-based management of research samples and online property prediction.
LIMS in the future The process of drug discovery is radically changing. For example, nowadays it is necessary to use more than one technique for protein delineation. Thiele describes the process: “First you need multiple MS-based techniques—perhaps MALDI TOF and MALDI TOF-TOF, which generate complementary information—to determine the protein’s primary structure. Next, the 3-D structure of the protein might be determined using Xray crystallographic techniques and NMR spectroscopy. Because these two techniques yield different information about the protein, the result is a more complete picture.” At present, most stepwise analytical determinations are preprogrammed in the instruments or by the LIMS, and every sample and subsample receives the same treatment. But more and more often, this isn’t the ideal way to work, Thiele says. Consider the case in which two sets of MS data are generated for a single sample. It makes no sense to use every peak in both spectra, because protein identifications might be possible with a subset of the data based on one pair of analyses. A second identification is not required; in fact, it just eats up time that could be devoted to other experiments. So, what is needed is LIMS software that internally decides whether the identification process has been completed. If not, additional information is obtained by more experimentation. This approach can also be useful with techniques other than MS. This kind of “smart” software is expected to encourage the integration of protein identification, lead generation, combinatorial reactions, and high-throughput screening. In each case, the analytical strategy is uniquely defined on the basis of the system’s assessment of the progress made. In the ideal scenario, this allows quick, efficient, parallel analyses that are adjusted according to real-time results. It might even mean that reaction products are screened concurrently with the reaction process. Instrument and biotech companies have this vision in mind now. And we’ve been told that the informatics systems that can execute it are just ahead. James P. Smith and Vicki Hinson-Smith are freelance writers based in Amherst, Mass. J A N U A R Y 1 , 2 0 0 5 / A N A LY T I C A L C H E M I S T R Y
39 A