Teaching Three-Dimensional Structural Chemistry ... - ACS Publications

May 6, 2011 - information content of the Cambridge Structural Database (CSD)3a ... advanced software for database searching, structure visualization...
0 downloads 0 Views 2MB Size
ARTICLE pubs.acs.org/jchemeduc

Teaching Three-Dimensional Structural Chemistry Using Crystal Structure Databases. 3. The Cambridge Structural Database System: Information Content and Access Software in Educational Applications Gary M. Battle,*,† Frank H. Allen,† and Gregory M. Ferrence‡ † ‡

Cambridge Crystallographic Data Centre (CCDC), 12 Union Road, Cambridge CB2 1EZ, United Kingdom Department of Chemistry, Illinois State University, Normal, Illinois 61790-4160, United States ABSTRACT: Parts 1 and 2 of this series described the educational value of experimental three-dimensional (3D) chemical structures determined by X-ray crystallography and retrieved from the crystallographic databases. In part 1, we described the information content of the Cambridge Structural Database (CSD) and discussed a representative teaching subset of ca. 500 CSD structures that have been selected for their educational relevance. In part 2, we exemplified the value of the CSD teaching subset by describing four worked examples of their use in a teaching context. Although the CSD teaching subset and its associated learning modules provide a major resource for chemical educators, there are many cases where the full CSD System, now covering more than 500,000 crystal structures, is essential to make an educational point. This is particularly true when introducing students to variance in real experimental observations, for example, where many hundreds of observations are required to generate statistically meaningful trends from the structural data or simply to introducing students to the search and manipulation of data that are commonly available in large chemical and biochemical databases. Here, we describe the complete CSD System and its associated software and highlight the extended range of discovery-based learning opportunities this affords. KEYWORDS: First-Year Undergraduate/General, Graduate Education/Research, Second-Year Undergraduate, Upper-Division Undergraduate, Chemoinformatics, Computer-Based Learning, Inquiry-Based/Discovery Learning, Internet/Web-Based Learning, X-ray Crystallography

arts 1 and 2 of this series1,2 described the important educational value of experimental three-dimensional (3D) chemical structures determined by X-ray crystallography and retrieved from the crystallographic databases. In part 1, we described the information content of the Cambridge Structural Database (CSD)3a of small organic and metal organic structures and discussed a representative teaching subset of ca. 500 CSD structures that have been selected for their educational relevance. This subset is freely available via the CCDC Web site3b together with software tools for access, visualization, and manipulation of the structures. Exploration of the CSD teaching subset using the Web application WebCSD4 was also highlighted. In part 2,2 we exemplified the value of the CSD teaching subset and WebCSD by describing four worked examples of their use in a teaching context. However, the complete CSD System, that is, the full database of over 500,000 structures, together with WebCSD and other advanced software for database searching, structure visualization and manipulation, and data analysis, significantly extends the range of discovery-based learning opportunities, including, for example, studies of mean molecular dimensions, stereochemistry and conformations, metal coordination sphere geometries, hydrogen bonding and other supramolecular phenomena, and reaction pathways. In part 3 of the series, we describe the complete CSD System and its associated software and indicate its special availability to educational institutions. The accompanying

article, part 45 illustrates a number of teaching examples that take advantage of the massive structural information content of the CSD System to broaden and enhance the chemical education experience.

P

Copyright r 2011 American Chemical Society and Division of Chemical Education, Inc.

’ THE CAMBRIDGE STRUCTURAL DATABASE SYSTEM The principal information content and features of a CSD structural entry were described and illustrated in part 1.1 They can be summarized as • Organic and metal organic structures determined by singlecrystal and powder X-ray diffraction and neutron diffraction. • Structures up to ca. 1,000 atoms including H atoms. • Primary results of diffraction analysis: 3D atomic coordinates, unit cell, etc. • 2D chemical diagram: encoded for structure and substructure searching. • Bibliographic information: author(s), journal, page number(s). • Other text and numerical information: compound name(s), molecular formulas, precision indicators, physical properties (where reported). • Each structure identified by a reference code: six letters identify the compound and two digits identify different determinations of the compound. Published: May 06, 2011 886

dx.doi.org/10.1021/ed1011019 | J. Chem. Educ. 2011, 88, 886–890

Journal of Chemical Education

ARTICLE

Table 1. Statistics for Selected Structure Types in the CSD Number of Structuresa

%CSDb

All structures

501,857

100.0

Organic structures

215,106

42.9

Carbohydrates Amino-acids and peptides

5,656 10,383

1.1 2.1

Nucleosides/nucleotides

1,925

0.4

Steroids

3,839

0.8

Terpenes

4,474

0.9

Alkaloids

2,678

0.6

Antibiotics

1,501

0.3

Structure Type

Figure 1. Annual growth of the CSD 1970 2009.

• Data are abstracted from more than 1,300 published literature sources. • More than 6,500 unpublished structures have been deposited directly by authors. • All data are value-added, validated, and curated by CCDC staff; any queries, inconsistencies, or errors are resolved, as far as possible, through author consultation. Compilation of the CSD began with around 5,000 published structures that were available in 1965. Since that time, annual acquisitions have risen almost exponentially, as shown in Figure 1, with nearly 40,000 structures now being archived each year. Because crystal-structure analysis is the preferred method for characterizing novel compounds (provided they can be persuaded to crystallize), the CSD displays huge chemical diversity among its entries. Although common compounds abound, the unusual is commonplace, and this is exemplified by the statistics presented in Table 1, assembled in January 2010 from a database containing 501,857 structural entries.

Metal organic structures

266,333

Any metal: 3-coordinate

5,527

53.1 1.1

Any metal: 4-coordinate

58,730

11.7

Any metal: 5-coordinate

30,537

6.1

Any metal: 6-coordinate Any metal: 7-coordinate

83,372 13,065

16.6 2.6

Any metal: 8-coordinate

6,605

1.3

Metal: π-complexes

54,235

10.8

Main-group metal compounds

31,470

6.3

a

Data from January 2010. b %CSD is the ration of structure type to the total number of structures in the CSD, expressed as percent.

features of ConQuest, Mercury, Vista, and WebCSD are summarized below, illustrating their use through two examples that are relevant in chemistry teaching: the Jahn Teller effect and the conformations adopted by straight-chain and substituted alkanes. ConQuest

’ THE CSD SOFTWARE SYSTEM The distributed CSD System3,6 includes four major software programs: • ConQuest6: for interrogating all database information fields and for retrieving molecular geometry and other information for analysis and postprocessing. • Mercury6 8: for graphical visualization and manipulation of CSD structures; the version of Mercury supplied within the CSD System has more advanced features than the free version described in part 1 of this series. • Vista3: for numerical and statistical analysis of molecular geometry retrieved from the CSD using ConQuest. • WebCSD4: A new Web-based search engine4 for interrogating the CSD and for displaying information content, including 2D chemical diagrams and 3D structures using either Jmol9 or OpenAstexViewer.10 The distributed system also contains two structural knowledge bases: (a) Mogul,11 a knowledge base of 20 million items of intramolecular geometry (bond lengths, valence angles, and torsion angles), organized according to a chemical hierarchy and (b) IsoStar,12 a knowledge base of intermolecular interactions, organized on the basis of chemical central groups and contact groups. Both of these knowledge bases have extensive applications in structural chemistry and drug discovery and educational applications are currently being developed. Full documentation for each of these programs and knowledge bases is maintained and updated at CCDC Web site.13 The principal

ConQuest6 is the primary program for searching and retrieving information from the CSD. ConQuest provides a full range of text and numerical search options (e.g., compound names, elements, chemical formulas, etc.), in addition to 2D and 3D search facilities. These include chemical substructure searching, geometric searching, that is, embellishing a substructure query with geometric constraints, and searching for nonbonded contacts, including hydrogen bonds and other strong interactions. ConQuest allows users to define, retrieve, and export sets of reference codes and geometric parameters for each chemical substructure located in a search, thus, providing direct links to the Mercury and Vista programs. Other ConQuest features include • Intuitive composition of search queries with context-dependent “Help” if needed. • A simple sketching tool for 2D structure and substructure searches, plus facilities to add both chemical and geometric constraints. • Copy and paste of 2D structures from popular proprietary software into the sketcher window. • Logical combination of queries of different database fields to generate more complex searches. • Search results presented as a reference code list with a range of facilities for browsing and viewing database information for each hit, including 2D and 3D viewers and a variety of text and numeric information windows. • Direct clickable links to primary electronic literature sources via the digital object identifier (DOI). 887

dx.doi.org/10.1021/ed1011019 |J. Chem. Educ. 2011, 88, 886–890

Journal of Chemical Education

ARTICLE

Figure 2. A 2D substructure search query: (A) search for six-coordinate CuO6 fragments in ConQuest query sketcher; (B) Refcode hitlist and chemical diagram display for a hit from search (panel A), including user-requested geometrical parameters; (C) Refcode hitlist and 3D molecular structure from search (panel A) with user-requested Cu ligand bond lengths shown for one hit.

Figure 3. ConQuest search queries used to examine conformational preferences in n-butanes: (A) n-butane; (B) 1-methyl-n-butane; (C) 1,2-dimethyln-butane, where terminal methyl groups are approximated as Csp3 atoms. The queries show the following settings: (i) all C atoms are Csp3 denoted by a total coordination number of T4, (ii) all bonds are acyclic, and (iii) terminal C-atoms are not to be in attached ring systems. Also shown below each query are the distributions of torsion angles in (D) C C C C in n-butane, (E) H C C C in 1-methyl-n-butane, and (F) H C C H in 1,2-dimethyln-butane fragments.

• Hitlist manager where search results can be combined and annotated. Figure 2A shows a 2D substructure search query, for CuO6 fragments, drawn into the sketcher window. Figure 2B shows the results pane, with the list of CSD reference codes for the search hits on the right, the 2D structure of a selected hit in the display window, together with the geometrical parameters requested for each hit, in this case the six Cu O bond lengths. The reference code hitlist can be navigated to show information about any structure in the list and, for example, Figure 2C shows the 3D structure for the CSD entry shown in Figure 2B. The search substructure is highlighted in Figure 2B,C, and Figure 2C also displays the user-specified geometrical data for the search fragment in that particular CSD entry. The overall distribution of Cu O bond lengths will be illustrated during the subsequent discussion of the Vista program. In organic chemistry, the usual student introduction to conformational analysis is through an understanding of the staggered

and gauche conformers around the C C bonds in normal and substituted alkanes. In Figure 3, we show the ConQuest CSD searches required to locate chemical fragments representative of n-butane, 1-methyl-n-butane, and 1,2-dimethyl-n-butane. Figure 3 shows how ConQuest can (i) constrain all bonds to be acyclic, (ii) constrain C atoms to be sp3 through use of the “total coordination number” setting (T4), and (iii) restrict the local environment of the fragment to ensure that the terminal C(sp3) atoms are not part of cyclic systems. The query also requires ConQuest to return the torsion angle that defines the fragment conformation, that is, C C C C in n-butane, H C C C in 1-methyl-n-butane, or H C C H in 1,2-dimethyl-n-butane for each fragment located in the search. Again, we will illustrate the subsequent conformational analysis in the discussion of the Mercury and Vista programs which follows. ConQuest will also locate intermolecular interactions, particularly hydrogen bonds, to generate knowledge that is vitally 888

dx.doi.org/10.1021/ed1011019 |J. Chem. Educ. 2011, 88, 886–890

Journal of Chemical Education

ARTICLE

Figure 4. (Top) Mercury ball-and-stick plot of 1,4-diaminobutane, one of the hits [QATWAJ15] from the query in Figure 3A, showing the fully extended staggered conformation having a C C C C torsion angle of 180° and (bottom) section of extended network formed by N H 3 3 3 OdC hydrogen bonds in benzamide (BZAMID).16.

Figure 5. Histogram of Cu X distances in six-coordinate Cu complexes showing the operation of the Jahn Teller effect in these compounds.

the extended intermolecular network formed by N H 3 3 3 OdC hydrogen bonds in benzamide [CSD reference code BZAMID16].

important in teaching molecular biology and the structure of proteins, in understanding pharmaceutical activity, and, indeed, how many extended crystal structures are constructed. We will exemplify and discuss the educational importance of intermolecular interaction searches in part 4 of this series.5

Vista

Vista3 reads files of geometrical parameters and other numerical information retrieved from the CSD using ConQuest, for example, the bond lengths shown for each hit in Figure 2B or the torsion angle data obtained via the ConQuest searches illustrated in Figure 3A C. Vista options to analyze and display these data are • Spreadsheet display of all retrieved parameters for each search hit. • Simple mathematical manipulation and combination of spreadsheet parameters to compute new parameters, for example, absolute values, parameter means, or differences, trigonometric or logarithmic values, etc. • Display histograms of parameter distributions or scattergrams of parameter pairs using either Cartesian or polar axes. • Simple descriptive statistics for parameter distributions: means, medians, upper and lower quartiles, standard uncertainties, etc. • Hyperlinking from plots to CSD entries. • Statistical analysis, including linear regression and principal components analysis. • Preparation of display graphics for publications and reports. In Figure 5, Vista has been used to generate a histogram of the Cu O bond lengths retrieved for each hit in the search illustrated in Figure 2. The resultant bimodal distribution clearly illustrates the operation of the Jahn Teller effect in six-coordinate complexes of this type. In Figure 3D F, we show the full distribution of torsion angles for each of the substructures shown in Figure 3A C, respectively. The wellknown variation in the staggered/gauche conformer ratio for these systems is clearly observed in the crystal structure data, and these ratios are discussed in energetic terms by Allen, Harris, and Taylor17 and, in an educational context, by Battle, Ferrence, and Allen.18

Mercury

Mercury6,9,10 offers a comprehensive range of tools for visualizing, manipulating, and comparing crystal and molecular structures and for exploring hydrogen-bonded networks and crystal packing. A freely downloadable version of Mercury has been described and exemplified in parts 1 and 2 of this series;1,2 the version supplied within the full CSD System has a number of more advanced features. Full details of Mercury are available from the CCDC Web site.14 The principal functionality can be summarized as • Extensive options for molecular visualization: sticks, capped sticks, ball-and-stick (all showing chemical bond types), space-filling plots, etc. with default or customizable atom coloring. • Measurement and display of geometrical parameters. • Location and display of hydrogen bonds and other nonbonded interactions. • Expansion and exploration of molecular networks based on nonbonded contacts • Searches for specific interaction motifs. • Display of crystal packing, symmetry elements, and slices through extended structures. • Display of multiple structures and options to overlay structure pairs. • Editing of structures, for example, adding missing H atoms. • Direct links to other CSD information fields and the electronic literature via the DOI. Mercury graphics are illustrated in Figure 4A, which shows a balland-stick plot of the fully extended staggered butane conformation in 1,4-diaminobutane [CSD reference code QATWAJ15], one of the hits from the search in Figure 3A, and Figure 4B, a section of 889

dx.doi.org/10.1021/ed1011019 |J. Chem. Educ. 2011, 88, 886–890

Journal of Chemical Education WebCSD

WebCSD,4 a recent addition to the CSD System, allows CSD users to search and browse the database via the Internet without need for local CSD or software installations. It is therefore ideal for use in a classroom situation. WebCSD also combines many of the search facilities of ConQuest with some of the information display abilities of ConQuest and Mercury: • Searches of text and numeric information. • 2D chemical substructure searching via an embedded query sketcher. • 2D chemical similarity searching finds CSD molecules that are identical or very similar to that input by the user. • Access structural information for search hits (or for any entries in the full CSD database or the teaching subset), including bibliographic information (with DOI link to the original articles), compound name, molecular formula, etc. • Display the 2D chemical diagram. • Show and interact with the 3D molecular and crystal structure using a choice of viewers: Jmol9 or OpenAstexViewer.10 • Export selected structures in CIF format19 for visualization in Mercury. WebCSD and its educational applications are fully illustrated in parts 1 and 2 of this series.1,2

ARTICLE

’ REFERENCES (1) Battle, G. M.; Allen, F. H.; Ferrence, G. M. J. Chem. Educ. 2010, 87, 809–812. (2) Battle, G. M.; Allen, F. H.; Ferrence, G. M. J. Chem. Educ. 2010, 87, 813–818. (3) (a) Allen, F. H. Acta Crystallogr. 2002, B58, 380–388. (b) Cambridge Structural Database (CSD) for Teaching. http:// www.ccdc.cam.ac.uk/free_services/teaching/ (accessed Apr 2011). (4) Thomas, I. R.; Bruno, I. J.; Cole, J. C.; Macrae, C. F.; Pidcock, E.; Wood, P. A. J. Appl. Crystallogr. 2010, 43, 362–366. (5) Battle, G. M.; Allen, F. H.; Ferrence G. M. J. Chem. Educ., 2010, 88, (DOI: 10.1021/ed1011025). (6) Bruno, I. J.; Cole, J. C.; Edgington, P. R.; Kessler, C.; Macrae, C. F.; McCabe, P. M.; Pearson, J.; Taylor, R. Acta Ceystallogr. 2002, B58, 389–397. (7) Macrae, C. F.; Edgington, P. R.; McCabe, P.; Pidcock, E.; Shields, G. P.; Taylor, R.; Towler, M.; van de Streek, J. J. Appl. Crystallogr. 2006, 39, 453–457. (8) Macrae, C. F.; Bruno, I. J.; Chisholm, J. A.; Edgington, P. R.; McCabe, P.; Pidcock, E.; Rodriguez-Monge, L.; Taylor, R.; van de Streek, J.; Wood, P. A. J. Appl. Crystallogr. 2008, 41, 466–470. (9) Hanson, R. M. J. Appl. Crystallogr. 2010, 43, 1250–1260. See also the Jmol Home Page. http://www.jmol.org (accessed Apr 2011). (10) OpenAstexViewer Home Page. openastexviewer.net. (11) Bruno, I. J.; Cole, J. C.; Kessler, M.; Luo, J.; Motherwell, W. D. S.; Purkis, L. H.; Smith, B. R.; Taylor, R.; Cooper, R. I.; Harris, S. E.; Orpen, A. G. J. Chem. Inf. Comput. Sci. 2004, 44, 2133–2144. (12) Bruno, I. J.; Cole, J. C.; Lommerse, J. P. M.; Rowland, R. S.; Taylor, R.; Verdonk, M. L. Isostar: A library of information about nonbonded interactions. J. Comput.-Aided Mol. Des. 1997, 11, 525–537. (13) CCDC Documentation Page. http://www.ccdc.cam.ac.uk/ support/documentation/ (accessed Apr 2011). (14) CCDC Web site http://www.ccdc.cam.ac.uk (accessed Apr 2011). (15) Thalladi, V. R.; Boese, R.; Weiss, H.-C. Angew. Chem., Int. Ed. 2000, 39, 918–922. (16) Penfold, B. R.; White, J. C. B. Acta Crystallogr. 1959, 12, 130–135. (17) Allen, F. H.; Harris, S. E.; Taylor, R. J. Comput.-Aided Mol. Des. 1996, 10, 247–254. (18) Battle, G. M.; Ferrence, G. M.; Allen, F. H. J. Appl. Crystallogr. 2010, 43, 1208–1223. (19) Hall, S. R.; Allen, F. H.; Brown, I. D. Acta Crystallogr. 1991, A47, 655–685. See also the IUCr Specifictions Web site. http://www. iucr.org/resources/cif/spec (accessed Apr 2011). (20) Enquiries about the fee should be directed to teaching@ ccdc.cam.ac.uk.

’ ACCESSING THE CSD SYSTEM The complete CSD is supplied to individual institutions for a small fee,20 which is further reduced for use by non-Ph.D. awarding institutions. This fee is necessary because the CCDC receives no public funding to carry out the work of collecting, evaluating, curating, and distributing database information and developing its associated softwareAlthoughgh retaining its close links to the University of Cambridge, the CCDC is now constituted as an independent nonprofit charitable organization under English law. The new WebCSD application is freely available for use with the CSD teaching subset;1 otherwise, it is available as part of the CSD System for those institutions that have a site-wide CSD license. ’ CONCLUSION In this article, we described the complete Cambridge Structural Database System, that is, the full CSD database together with its associated software components, and we have exemplified use of the System in the teaching curriculum via studies of the Jahn Teller effect and of simple conformational analysis. These examples extend the capabilities of CSD-based teaching beyond the simpler examples based on the freely available teaching subset of some 500 CSD entries reported in parts 1 and 2 of the series.1,2 In part 4,5 we further extend the range of examples that require the full CSD System for their effectiveness in a teaching environment. These “full-system” examples include the analysis of molecular dimensions and studies of halonium ion intermediates, high- and low-spin complexes, and of hydrogen bonding in chemical and biological systems.

’ AUTHOR INFORMATION Corresponding Author

*E-mail: [email protected].

’ ACKNOWLEDGMENT The teaching examples described above are based on work supported by the United States National Science Foundation under Grant No. 0725294. 890

dx.doi.org/10.1021/ed1011019 |J. Chem. Educ. 2011, 88, 886–890