Anal. Chem. 2001, 73, 2941-2951
Mass Spectrometry Screening of Combinatorial Mixtures, Correlation of Measured and Predicted Electrospray Ionization Spectra Nathan Yates,* Daniel Wislocki, Andrew Roberts, Scott Berk, Tracey Klatt, Dong-Ming Shen, Chris Willoughby, Keith Rosauer, Kevin Chapman, and Patrick Griffin
Department of Basic Chemistry, Merck Research Laboratories, Merck & Co. Inc., Rahway, New Jersey 07065
Methodology was developed to afford rapid characterization of multicomponent mixtures of small organic molecules prepared by split-and-mix combinatorial synthesis. This methodology involved the use of liquid chromatography mass spectrometry (LC/MS) combined with correlation analysis of measured versus predicted electrospray ionization mass spectra. Low-resolution mass spectra of complex mixtures revealed predictable patterns that confirm library products, assisted in identifying chemical synthesis errors, and assessed overall library integrity. In general, equal signal intensities were observed for most combinatorial mixture components, indicating that differences in electrospray ionization efficiency was not a major limitation to this approach. High-throughput data processing programs and informatics tools were used to speed data analysis and to simplify the presentation of the library characterization results. This approach has been used to characterize combinatorial libraries that were synthesized for a variety of drug-discovery programs. Examples are shown for library formats of 1, 40, 66, 280, and 400 component(s)/well. The applicability of this approach to large combinatorial mixtures should allow direct characterization of massive combinatorial libraries. The concept of identifying efficacious new chemical entities by rapid screening of large collections of novel molecules was the driving force behind the development of combinatorial chemistry in the pharmaceutical industry.1,2 Synthesis and assay strategies have been developed that allow mixtures of compounds to be produced and screened for activity in a simultaneous fashion (or in parallel).3-5 The creation and biological testing of combinatorial mixtures or libraries has matured rapidly in recent years and offers clear throughput advantages over single-compound approaches.6,7 To take full advantage of mixture synthesis and screening, detailed analytical characterization of combinatorial * Phone: (732) 594-1749. Fax: (732) 594-1370. E-mail: nathan_yates@ merck.com. (1) Gallop, M.; Barrett, R.; Dower, W.; Fodor, S.; Gordon, E. J. Med. Chem. 1994, 37, 1233. (2) Gallop, M.; Barrett, R.; Dower, W.; Fodor, S.; Gordon, E. J. Med. Chem. 1994, 37, 1385. (3) Balkenhohl, F.; von dem Bussche-Hunnefeld, C.; Lansky, A.; Zechel, C. Angew. Chem., Int. Ed. Engl. 1996, 35, 2288. (4) Houghton, R., Ed. Pept. Sci. 1995, 37 (3). (5) Thompson, L.; Ellman, J. Chem. Rev. 1996, 96, 555. 10.1021/ac010021r CCC: $20.00 Published on Web 05/30/2001
© 2001 American Chemical Society
libraries has been found to be useful.8-10 As demonstrated in this paper, electrospray ionization (ESI) mass spectrometry (MS), when coupled with appropriate spectrum modeling and correlation techniques, can be used to rapidly characterize combinatorial mixtures with sufficient throughput to screen very large and complex libraries. The analytical characterization of combinatorial mixtures aims to measure the integrity of the chemical library. The advantages of library characterization are as follows: First, verifying the integrity of a library and identifying synthetic errors is of importance to combinatorial chemists to demonstrate that the chemistry was successful and to identify problematic reactions that need to be improved for future libraries. Second, library integrity may become important during the interpretation of biological assay results. For assays that return similar biological activities for more than one mixture, a decision must be made as to which mixture is the best candidate for resynthesis as single compounds. Third, library characterization can aide in the development of novel assay strategies, such as affinity selection. We set out to develop a strategy that would rapidly identify good mixtures and bad mixtures in an effort to reduce the total number of potential lead candidates to a more manageable number. Split-and-mix solid-phase synthesis, pioneered by Furka in 1991,11 generates combinatorial libraries as collections of related mixtures. An important feature of the split-and-mix approach is that multiple compounds are synthesized as mixtures in shared reactions. In this fashion, combinatorial libraries containing hundreds of thousands of compounds can be prepared in relatively few reaction steps. The number of compounds produced is geometrically related to the number of chemical subunits that are used in each reaction step. In practice, combinatorial libraries often (6) Rohrer, S.; Birzin, E.; Mosley, R.; Berk, S.; Hutchins, S.; Shen, D.; Xiong, Y.; Hayes, E.; Parmar, R.; Foor, F.; Mitra, S.; Degrado, S.; Shu, M.; Klopp, J.; Cai, S.; Blake, A.; Chan, W.; Pasternak, A.; Yang, L.; Patchett, A.; Smith, R.; Chapman, K.; Schaeffer, J. Science 1998, 282, 737-740. (7) Berk, S.; Rohrer, S.; Degrado, S.; Birzin, E.; Mosley, R.; Hutchins, S.; Pasternak, A.; Schaeffer, J.; Underwood, D.; Chapman, K. J. Comb. Chem. 1999, 1 (5), 388-396. (8) Carell, T.; Wintner, E.; Sutherland, A.; Rebek, J.; Dunayevskiy, Y.; Vourous, P. Chem. Biol. 1995, 2, 171-183. (9) Dunayevskiy, Y.; Vouros, P.; Winter, E.; Shipps, G.; Carell, T. Proc. Natl. Acad. Sci., U.S.A. 1996, 93, 6152-6157. (10) Dunayevskiy, Y.; Vouros, P.; Carell, T.; Wintner, E.; Rebek, J. Anal. Chem. 1995, 67, 2906-2915. (11) Furka, A.; Sebestyen, F.; Asgedom, M.; Dibo, G. Int. J. Pept. Protein Res. 1991, 37, 487-493.
Analytical Chemistry, Vol. 73, No. 13, July 1, 2001 2941
contain more compounds than are predicted by the unique combinations of subunits as a result of the possible isomers that are involved. A three-step synthesis that incorporates five chemical subunits in each reaction step yields 125 compounds and requires fifteen reaction steps (e.g., number of compounds ) 5 × 5 × 5; number of reaction steps ) 5 + 5 + 5). By comparison, the preparation of the same 125 compounds using a three step parallel single compound synthesis would require 375 separate reactions. Upon completion of the synthesis, compounds are cleaved from the solid support as five discrete mixtures, each containing twentyfive compounds. It is important to note that the cleaved mixtures are inherently similar and differ only in the last chemical subunit that was added. Therefore, the electrospray ionization efficiencies are relatively constant. Mass spectrometry is well-suited to the analysis of combinatorial mixtures because of its ability to rapidly separate multicomponent samples by molecular weight.12-14 Although these data are often qualitative, mass spectrometry provides significantly more information about a synthesis than can be inferred from traditional bulk measurements (e.g., calculated yields based on an average molecular weight). Electrospray ionization mass spectrometry has been shown to be a useful technique for the analysis of combinatorial mixtures,15-19 and many MS techniques have been used to identify individual library components.20-27 In practice, even high-resolution mass spectrometry cannot uniquely identify all of the components in a library because of the close molecular weights and degenerate molecular formulas.28,29 In addition, the complex nature of combinatorial spectra impedes manual interpretation and creates the need for automated data analysis tools that can simplify and speed data interpretation.30-35 (12) Loo, J. Eur. Mass Spectrom. 1997, 3, 93-104. (13) Su ¨ Bmuth, R.; Jung, G. J. Chromatogr. B 1999, 725, 49-65. (14) Enjalbal, C.; Martinez, J.; Aubagnac, J. Mass Spectrom. Rev. 2000, 19, 139161. (15) Metzger, J.; Kempter, C.; Wiesmuller, K.; Jung, G. Anal. Biochem. 1994, 33, 261-277. (16) Nawrocki, J.; Wigger, M.; Watson, C.; Hayes, T.; Senko, M.; Benner, S.; Eyler, J. Rapid Commun. Mass Spectrom. 1996, 10, 1860-1864. (17) Esser, C.; Kevin, N.; Yates, N.; Chapman, K. Bioorg. Med. Chem. Lett. 1997, 7 (20), 2639-2644. (18) Srebalus, C.; Li, J.; Marshall, W.; Clemmer, D. J. Am. Soc. Mass Spectrom. 2000, 11, 352-355. (19) Winger, B.; Campana, J. J. Rapid Commun. Mass Spectrom. 1996, 10, 18111864. (20) Zambias, R.; Boulton, D.; Griffin, P. Tetrahedron Lett. 1994, 35, 42834286. (21) Gorlach, E.; Richmond, R.; Lewis, I. Anal. Chem. 1998, 70, 3227-3234. (22) Youngquist, R.; Fuentes, G.; Lacey, M.; Keough, T. J. Am. Chem. Soc. 1995, 117, 3900-3906. (23) Geysen, H.; Wagner, C.; Bodnar, W.; Markworth, C.; Parke, G.; Schoenen, F.; Wagner, D.; Kinder, D. Chem. Biol. 1996, 3, 679-688. (24) Blom, K.; Combs, A.; Rockwell, K.; Zhang, J.; Chen, T. Rapid. Commun. Mass Spectrom. 1998, 12, 1192-1198. (25) Brummel, C.; Vickermann, J.; Carr, S.; Hemling, M.; Roberts, G.; Johnson, W.; Weinstock, J.; Gaitanopoulos, D.; Benkovic, S.; Gaitanopoulos, D.; Benkovic, S.; Winograd, N. Anal. Chem. 1996, 2, 237-242. (26) Aubagnac, J.; Amblard, M.; Enjalbal, C.; Martinez, J.; Durand, P.; Renault, P. Comb. Chem. High Throughput Screening 1999, 2, 289-296. (27) Dule´, B.; Verne-Mismer, J.; Wolf, E.; Kugel, C.; Hijfte, L. J. Chromatogr. B 1999, 725, 39-47. (28) Demirev, P.; Zubarev, R. Anal. Chem. 1997, 69, 2893-2900. (29) Zubarev, R.; Hakansson, P.; Sundqvist, P. Anal. Chem. 1996, 68, 40604063. (30) Kienle, S.; Wiesmu ¨ ller, K.; Bru ¨ njes, J.; Metzger, J.; Jung, G. Fresenius J. Anal. Chem. 1997, 359, 10-14. (31) Tong, H.; Bell, D.; Tabei, K.; Siegel, M. J. Am. Soc. Mass. Spectrom. 1999, 10, 1174-1187.
2942
Analytical Chemistry, Vol. 73, No. 13, July 1, 2001
Here we describe a computational method for the evaluation of low-resolution mass spectra that are obtained from combinatorial mixtures. Correlation of measured and predicted electrospray ionization mass spectra has proven to be a rapid means to confirm library products, identify synthesis errors, and estimate overall library integrity. Interestingly, the compounds present in many combinatorial mixtures showed similar ionization efficiencies and uniform signal response, indicating that the preferential ionization/suppression sometimes associated with the electrospray ionization process was not a major limitation to this approach. High-throughput data processing programs and chemical informatics tools were used to speed data analysis and to simplify the presentation of the library characterization results. This approach was used to characterize combinatorial libraries that were synthesized for drug-discovery programs. Example results are shown for library formats of 1, 40, 66, 280, and 400 component(s)/well. The applicability of the technique to large combinatorial mixtures should allow the direct characterization of massive combinatorial libraries. EXPERIMENTAL SECTION Combinatorial Chemistry. A variety of split-and-mix synthesis strategies have been employed to create several hundred combinatorial libraries for pharmaceutical lead identification and optimization. Table 1 contains the library identifiers, dimensions, subunits, and an example structure for several of the combinatorial libraries used in this study. The synthesis of library C-000,236 was carried out on NovaBiochem TGR resin (NovaBichem; La Jolla, CA). Forty x subunits, 10 y subunits, and 79 z subunits were used to yield 79 mixtures of 400 compounds each of the general structure 1 (see Table 1). C-000,320 was designed to contain 5148 trisubstituted pyrazoles, each of the general structure 2. The synthesis was carried out in four dimensions, w(3)-x(66)-y(13)z(2), where subunits w, y, and z were formatted in spatial addressed arrays to yield 78 mixtures of 66 compounds. The synthesis of combinatorial library C-000,035 was carried out on Rapp TentaGel HPMB resin (Rapp Polymere, Eugenstr, Turbingen, Germany). Subunits x(20)-y(20)-z(79) were combined in three dimensions to produce 31 600 compounds as a set of 79 mixtures, each containing 400 compounds of the general structure 3. A focused combinatorial library, C-000,272, containing 75 single compounds was prepared by combining 15 x subunits and 5 y subunits to yield compounds of the general structure 4. Combinatorial library C-000,228 was synthesized as a one-dimensional deconvolutable mixture with subunits x(14)-y(20)-z(65), with general structure 5. The final products for each library were cleaved from the solid support resin, lyophilized, weighed, and dissolved in DMSO. Stock solutions with a predicted concentration of 10 mM were stored frozen at -80 °C in 2-mL deep-well microtiter plates. The predicted concentration of the stock solutions is based on the amount solid support used and the expected loading capacity. Clearly, if the chemistry is not clean or quantitative, the predicted yields and (32) Hegy, G.; Go¨rlach, E.; Richmond, R.; Bitsch, F. Rapid. Commun. Mass Spectrom. 1996, 10, 1894-1900. (33) Steinbeck, C.; Berlin, K.; Richert, C. J. Chem. Inf. Comput. Sci. 1997, 37, 449-457. (34) Richmond, R.; Go ¨rlach, E. Anal. Chim. Acta 1999, 390, 175-183. (35) Richmond, R.; Go ¨rlach, E. Anal. Chim. Acta 1999, 394, 33-42.
Table 1. Identifiers, Description, Formats, and Example Structures for Several Small Organic Libraries Prepared by Combinatorial Synthesis
stock concentrations will be incorrect. These “master plates” were used later for assay plate replication and dilution. Microtiter plates sent for MS analyses were normally prepared by transferring 2 uL of stock solution into 96-well assay plates (Polyfiltronics; Rockland, MA) and diluting with 100 uL of a water:acetonitrile: trifluoroacetic acid (50:50:0.1, v/v/v) solution. A Packard plate sealer and temperature activated cover film (Packard Instruments; Meriden, CT) was used to seal the 96-well plates to prevent evaporation during analysis. Mass Spectrometry. Flow injection mass spectrometry (FIMS) was performed on a PlatformLC quadrupole mass spectrometer (Micromass; Manchester, U.K.). A HP1100 HPLC (Agilent Technologies; Paramus, NJ) equipped with a Gilson 215 96-well plate autosampler (Gilson; Middleton, WI) and a 2 mm × 30 mm column packed with 7 µm C18 particles (Bodman; Aston, PA) was used to transport the samples into the mass spectrometer. The column was used to separate compounds from salts and DMSO. The mobile phase consisted of water:acetonitrile (30:70, v/v) with 0.05% trifluoroacetic acid. A flow rate of 300 uL/min was used, and 10-uL sample injections were made at 90-s intervals. The LC effluent entered the mass spectrometer through a metal capillary that was maintained at +3.5 kV. The cone voltage was +30 V, and the ion intensity was optimized by tuning on the protonated tetrapeptide Met-Arg-Phe-Ala at m/z 524. The source temperature
was 180 °C. Nitrogen was used as the drying and nebulizing gas at flow rates of ∼450 and 20 L/h, respectively. Positive ion electrospray mass spectra were recorded in centroid mode over a mass range of 300 to 1000 Da in a scan time of 1.5 s. Library Design and Enumeration. A structure-based computer interface was used by the combinatorial chemists to select, import, draw, and edit the molecular subunits that were used during the synthesis of each combinatorial library. The chemical reactions and combinatorial approach employed were also defined. A proprietary chemical compiler was used to combine the subunits in silico, render the 3-dimensional chemical structures of the final compounds, and record the molecular formulas. Enumeration of the subunits, final compounds, and the library mixtures was performed prior to registration of the combinatorial library in a corporate database. Sample Analysis and Data Processing. A text file containing the library identifier, library dimensions, subunit identifiers, mixture identifiers, mixture components, compound identifiers, and molecular formulas was downloaded from the corporate database. Mixture identifiers were imported into the MassLynx data system (Micromass; Manchester, U.K.) sample list editor and used as the data filename for each analyzed sample. Following sample analysis, background-subtracted mass spectra were automatically extracted from the raw MS data file and converted to a mass/intensity pair text file. The text files were automatically uploaded to a WebServer and serve as the measured ESI mass spectra that are used for display and correlation analysis. Predicted mass spectra for each combinatorial mixture were calculated from the molecular formulas contained in the library text file. The complete isotopic patterns for each protonated molecule were calculated using the natural isotopic abundance of the elements. These patterns include all possible isotope peaks (e.g., [M + 1], [M + 2], [M + 3],‚‚‚‚[M + n]). The isotope patterns of the individual compounds were summed to produce a composite spectrum for the complete combinatorial mixture. When the isotopic peaks for different compounds overlap, the cumulative relative abundance is displayed. Peaks that fall inside a 1 Da window were treated as a single peak and centroided accordingly. These data were stored in a mass/intensity pair text file, and serve as the predicted ESI mass spectra that are used for display and correlation analysis. Correlation analysis was performed using a Microsoft Excel Visual Basic application. The measured and predicted ion intensities for each combinatorial mixture were imported into Microsoft Excel and formatted to align the intensity for each m/z by row. The built-in CORREL function was used to return the correlation coefficient of measured and predicted intensity data sets. A compound score was determined by calculating the correlation coefficient of the two intensity data sets at all m/z values where the predicted intensity is nonzero. A purity score was determined by calculating the correlation coefficient of the two intensity data sets at all m/z values where measured intensity was nonzero. A localized similarity score was determined by calculating the correlation coefficient for a 5 Da-wide window centered around each m/z where the predicted intensity is nonzero. The compound, purity, and localized similarity index scores were exported from Excel and uploaded to the corporate database. Analytical Chemistry, Vol. 73, No. 13, July 1, 2001
2943
Figure 1. Comparison of the low-resolution positive electrospray ionization spectra of three 40-component mixtures from combinatorial library C-000,236. The compound/purity scores for spectra a, b, and c are 0.92/0.92, 0.91/0.90, and 0.92/0.30, respectively. Also included are the subunit identifiers for each mixture.
The time required to analyze and process a combinatorial library is determined by the number of samples to be analyzed. By design, split-and-mix libraries contain modest numbers of mixtures and can usually be stored in a single microtiter plate. The complete FI-MS system is capable of analyzing a full 96 well microtiter plate in approximately 2.5 h. Data analysis requires approximately 30 min per plate, and all of the results are immediately transferred to a WebServer upon completion. RESULTS AND DISCUSSION Flow Injection Mass Spectra. Comparison of the lowresolution electrospray ionization spectra recorded for groups of combinatorial mixtures was found to provide useful information about the integrity of a combinatorial library. Figure 1 compares the spectra of three deconvolution mixtures from combinatorial library C-000,236. The molecules present in these mixtures were synthesized to contain a common backbone constructed from unique y and z subunits, linked to one of forty possible x subunits. By convention, the z subunit is the last group added to the molecule, and the chemistry was carried out to yield equimolar quantities of each compound. The dominant peaks observed in these spectra indicate regions where degenerate molecular formulas or isobaric interference occur. Comparison of spectra 1a, 1b, and 1c reveal a common fingerprint pattern that is representative of the forty x subunits contained in each mixture. As expected, the fingerprint patterns in spectra 1a and 1b are offset by 36 Da as a result of the difference in molecular masses of the y5 and y6 subunits. The low molecular mass fingerprint pattern observed in spectrum 1c is shifted by +28 Da as compared to 1b because of the mass difference between the z1 and z48 subunits. The higher molecular-weight fingerprint pattern was due to the undesired addition of second z subunit. As a result, the yield is less than expected, as indicated by the lower signal-to-noise ratio observed in spectrum 1c. 2944
Analytical Chemistry, Vol. 73, No. 13, July 1, 2001
It is well-recognized that the electrospray process is subject to ion suppression events that result in the preferential ionization of one compound, often the most abundant, over another. This problem is particularly prevalent for flow injection analyses that provide limited chromatographic separation. Ion suppression clearly has the potential of altering the spectra that are observed for combinatorial mixtures, especially when the concentrations of impurities or byproducts are high. Interestingly, ion suppression does not appear to be as big a problem as one might expect. Most major impurities or byproducts of the chemical synthesis are removed by washing the compounds prior to cleavage from the solid support. The concentration and ionization characteristics of the individual library components are similar, and ion suppression was not found to alter the relative abundance of individual library components. It was found to be important to use a short HPLC column to afford some chromatographic separation of the compounds from unretained solvent and DMSO. Spectrum Prediction Model. Spectrum modeling was found to be a useful tool for simplifying the examination of electrospray spectra of combinatorial mixtures. Figure 2 shows the measured 2a and predicted 2b FI-MS spectra for a 40-component mixture from library C-000,236. When viewed together, it is clear that the majority of the ion signal in the measured spectrum is due to the expected combinatorial products. The ions observed in the measured spectrum that are not included in the predicted spectrum are believed to be due to synthetic byproducts, unreacted starting material, or chemical background. These interference signals were not unexpected. No sample purification was employed on the libraries that were analyzed. The predicted spectra are generated from the combinatorial products’ molecular formulas using the following assumptions. First, it was assumed that electrospray ionization would produce mass spectra that contained only singly charged protonated
Figure 2. Comparison of the measured and predicted positive electrospray ionization spectra of a 40-component mixture from combinatorial library C-000,236. The compound/purity score for this mixture is 0.94/0.94. The peak labels are absent so as not to detract from the comparison of the spectra.
Figure 3. Comparison of the measured and predicted mass spectra of a mixture from the combinatorial library C-000,236 that is composed of subunits x(1-40)-y(10)-z(16). The compound/purity score for this mixture is 0.49/0.61.
molecules. Second, it was assumed that each compound would be found present in equimolar concentration. Third, it was assumed that all of the compounds would have identical ionization efficiencies and contribute equally to the electrospray mass spectrum. Clearly, there was the potential for significant limitations with these broad assumptions. Electrospray ionization commonly produces multiple charged species, fragment ions, and ion adducts. Different loading capacities for solid support resins, incomplete reaction, and unexpected side products can reduce the quantitative nature of combinatorial synthesis. In addition, the efficiency of electrospray ionization is known to depend on a compound’s structure, the surrounding chemical environement, and ion suppression. Nonetheless, these basic assumptions served as a starting point for generating the predicted spectra and
appeared to be reasonable for many of the combinatorial mixtures that were examined. Close examination of the spectra shown in Figure 2 reveals the measured 2a and predicted 2b isotopic fingerprint of the combinatorial products. The observed ions and the measured relative intensity show good agreement with the predicted spectrum. The major difference between these two spectra is the low-level signals present in the baseline above m/z 580 that are credited to chemical byproducts that are produced during the synthesis. In addition, the intensity ratios of peaks at m/z 538539 and 568-569 are different in the measured and predicted spectrum. Estimation of Mixture Integrity. A correlation analysis of the measured and predicted intensities was performed to obtain Analytical Chemistry, Vol. 73, No. 13, July 1, 2001
2945
Figure 4. Comparison of the measured and predicted mass spectra of a 280-component mixture from the combinatorial library C-000,228 that is composed of subunits x(1-14)-y(1-20)-z(1-65). The compound/purity score for this mixture is 0.73/0.64.
Figure 5. Computer screen capture of the LibView combinatorial library data analysis program showing the purity scores for library C-000,320.
compound and purity scores for a given mixture. The compound score is defined as the correlation coefficient of the measured and predicted intensities at all m/z values where the predicted intensities are greater than zero. The valid range of the correlation coefficient is 1 to -1, where 1 indicates that the two data sets increase and decrease with perfect agreement. A correlation coefficient equal to -1 indicates that the two data sets track 2946
Analytical Chemistry, Vol. 73, No. 13, July 1, 2001
perfectly, but in an opposite fashion. In practice, the correlation coefficients determined for mass spectra fall between 0 and 1, indicating no agreement and perfect agreement, respectively. Compound scores approaching 1 indicate that the combinatorial mixtures contain all of the expected products. Conversely, low compound scores indicate that many of the products are absent. A low compound score can also result when an intense isobaric
Figure 6. Computer screen capture of the LibView combinatorial library data analysis program showing a reduced region of the measured and predicted mass spectra for combinatorial mixture D-014 185 from combinatorial library C-000,320. The subunit identifiers, chemical formulas, monoisotopic molecular weights, and localized similarity index scores are displayed for the mass range between 606 and 625 Da.
chemical interference occurs, hiding the underlying fingerprint of the combinatorial products. This situation is rarely observed, because the products of the combinatorial synthesis, even if they are incorrect, are normally the most abundant material in the samples. Importantly, high compound scores are retained even when the relative intensities of the combinatorial products are weak as compared to the background. The purity score is defined as the correlation coefficient for the measured and predicted intensities at all m/z values where the measured intensity is greater than zero. Purity scores approaching 1 indicate that the majority of the measured ion signal is due to the combinatorial products and few interference signals are present. Low purity scores can result from a number of scenarios. A low compound score will result in a low purity score. Chemical interference signals reduce purity scores, as observed in Figure 1(c) where the compound score is 0.92, and the purity score is 0.30. Low yields or weak ion signals also result in a low purity score. In practice, a purity score that approaches 1 is a strong indication that combinatorial synthesis was successful and that only the desired products are present in the mixture. Compound and purity scores are often sufficient to rapidly identify synthetic deficiencies or excesses in a combinatorial library. Figure 3 shows the measured and predicted mass spectra for a 40-component mixture from the combinatorial library C-000,236 comprised of subunits x(1-40)-y(10)-z(16). A relatively low compound score of 0.49 was determined for this and other mixtures that contained subunit z(16). Examination of the
single peaks at m/z 506 and m/z 564 in the predicted and measured spectra mixture reveal that the z(16) subunit contained approximately 60% of a contaminant that had a molecular weight that was 2 Da less than z(16). The contaminant resulted in the unexpected synthesis of two 40-component mixtures, for a total of 80 compounds. LC/MS analysis of the z(16) starting material confirmed the presence and molecular mass of the contaminant. Figure 4 compares the measured 4a and predicted 4b mass spectra for a 280-component mixture from the combinatorial library C-000,228. A compound score of 0.73 indicates that some of the expected products are present in the mixture, as can be observed by a visual analysis of the data. The purity score for this mixture is 0.64 and indicates that much of the measured signal does not correlate well with ions in the predicted spectrum. Samples that are characterized by a purity score that was less than the compound score tended to fall in the category of “impure mixtures”. Data Analysis Software. By using the correlation coefficient to define the compound and purity scores for a mixture, the time needed to analyze the FI-MS data for a complete combinatorial library was greatly reduced. Figure 5 shows a screen capture from the LibView computer program. LibView is a Web-based application that was written as a Java application to give scientists a direct link to the combinatorial library characterization data. The main screen contains two sections: a colored representation of the microtiter plate that was analyzed by FI-MS, and a results table that contains the library identifier, plate dimensions, well locations, Analytical Chemistry, Vol. 73, No. 13, July 1, 2001
2947
Figure 7. Comparison of the measured and predicted positive electrospray ionization spectra of a 400-component mixture from combinatorial library C-000,035. The compound score for this mixture is 0.81. The bottom spectrum displays a predicted mass spectrum for this mixture that does not include compounds containing the x(3) and y(8) subunits. The compound score for these two predicted spectra was calculated to be 0.86.
and mixture identifiers. The compound or purity score data can be displayed by toggling the “Show Compound Score” button; the “Sort Type” legend display tells which score is currently displayed. The color spectrum on the bottom of the screen relates the color of the sample wells to the value of the compound or purity score; blue indicates a high score approaching 1 and red indicates a low score approaching -1. Empty wells are left uncolored. In this example, the compound scores for the 78 mixtures of 66 compounds for combinatorial library C-000,320 ranged from 0.94 to 0.85. The LibView program displays these results with a dark blue color (not shown), indicating that all of the measured spectra contain the fingerprint patterns that correspond to the 10 296 predicted combinatorial products. The screen of the LibView application can be toggled between compound scores and purity scores, thus allowing for the rapid visual differentiation of pure and impure mixtures. Figure 5 shows the LibView screen with the purity scores displayed for combinatorial library C-000,320. The compound/purity scores for wells A:4 (0.87/0.55) and B:3 (0.93/0.49) indicate that although the combinatorial products appear to be present, a major portion of the ion signal observed for these samples does not correspond to the predicted spectra. It is obvious that the compound and purity scores alone do not provide the same level of information that can be obtained by close examination of the measured and predicted mass spectra. The LibView program displays the FI-MS spectra when a sample well is selected with the mouse. Figure 6 shows an example LibView spectrum display for well B:3 from combinatorial library 2948 Analytical Chemistry, Vol. 73, No. 13, July 1, 2001
C-000 320 in the foreground, with the plate view displayed in the background. The displayed mass range can be reduced (shown here) or expanded using the mouse as a cursor. The solid red circles that are superimposed on the measured spectrum correspond to the localized similarity index (LSI) scores for each of the predicted ions. LSI scores that drop below unity are used to identify regions of the measured mass spectrum that correlate poorly with the predicted spectrum. The purpose of the LSI is to direct the user’s eye to regions of the mass spectrum that may contain synthetic deletions or additions. In this example, the LSI pinpoints the mass range between 608 and 618 Da as an area of potential interest. By selecting “View Text Range” and dragging the cursor over the mass range of interest, a table of the predicted compound’s subunit identifiers, chemical formula, monoisotopic MW, and numeric LSI score is generated. The LibView screen capture displayed in Figure 6 demonstrates how this functionality allows regions of the electrospray mass spectra to be examined in detail and related to the desired products in the combinatorial mixture. As is observed in this example, many of the combinations of chemical subunits produce degenerate chemical formulas that cannot be differentiated, even by high resolution mass spectrometry. Thus, the increased relative intensities observed for isobaric compounds are the only measurements that suggest if these degenerate compounds are present in the mixture. Complex Mixture Libraries. Significant emphasis has been placed on the design of massive combinatorial libraries comprised of mixtures containing 104 molecules or more. Although libraries of this complexity are certainly of academic interest, the combi-
Figure 8. Computer screen capture showing the output of the LibView library analysis program for a pure single compound that was synthesized as part of the single compound library C-000,272. The measured and predicted mass spectra are shown in the foreground, and the microtiter plate layout of the purity score is shown in the background. The displayed compound has a compound/purity score of 0.99/0.98 and is located in microtiter well A:8.
natorial mixtures shown here were synthesized for the purpose of identifying active and selective compounds that could advance a drug discovery program. For this reason, the techniques outlined above have been applied to only a few combinatorial libraries that contain mixtures of more than a few hundred compounds. An example of the measured and predicted mass spectra for a 400 component mixture is shown in Figure 7a,b. Significant differences in the relative intensity of individual ions are clearly apparent for these spectra. However, strong similarities are observed in the general distribution of the ion signals and in the regions of the measured spectrum that lack significant ion signals. Close examination of a reduced mass range (not shown) reveals strikingly similar isotopic profiles for the measured and predicted mass spectra. Clearly, flow injection mass spectrometry can provide useful information for 400-component combinatorial mixtures. To estimate the extent of library deficiencies that could be observed in FI-MS data, predicted electrospray ionization were compared that contained known deletions. Figure 7b,c compares the predicted mass spectra for combinatorial library C-000,035 that contains (b) 400 compounds with and (c) 361 compounds without the compounds that contain the x(3) and y(8) subunits. The omission of these subunits was selected at random and corresponds to the loss of 39 compounds (10% deletion). Interestingly, visual inspection of these predicted spectra reveals noticeable differences in the relative intensities of the two spectra that are
not to dissimilar to those observed in Figure 7a. The compound scores calculated for the data in Figure 7a,c are 0.81 and 0.86, respectively. Several combinatorial mixtures from library C-000,035 were subject to synthetic deconvolution where the 400-component mixtures were resynthesized as 20 mixtures of 20 compounds. Manual inspection of these simpler mixtures suggested that 85% of the chemistry was successful and 15% of the compounds were not formed in the original 400-component mixtures. Single Compound Libraries. Although the correlation methods described here were developed to assist in the characterization of combinatorial mixtures that were prepared by split-and-mix methods, these techniques may also be applied to singlecompound libraries, traditional medicinal chemistry products, or natural product isolates. Figure 8 shows the LibView output for a pure single compound that was synthesized as part of a targeted single compound library C-000,272. The measured and predicted mass spectra shown in the foreground confirm that the majority of the measured ion signal is observed at the predicted m/z. The low-abundance ions at m/z 386 and 365 are 9 and 30 Da less than the expected m/z of 395. These mass differences do not correspond to common neutral losses and are not believed to be fragment ions. The compound and purity scores were 0.99 and 0.98, respectively. The purity score for this compound and several of the remaining compounds is displayed in the background. Note that the compound selected in Figure 8 is represented by a dark blue circle located in microtiter well A:8. Analytical Chemistry, Vol. 73, No. 13, July 1, 2001
2949
Figure 9. Computer screen capture showing the output of the LibView library analysis program for an impure pure single compound that was synthesized as part of the single compound library C-000,272. The measured and predicted mass spectra are shown in the foreground, and the microtiter plate layout of the purity score is shown in the background. The displayed compound has a compound/purity score of 0.99/0.11 and is located in microtiter well A:7.
An example of an impure single compound is shown in Figure 9. Here the compound score is 0.99 as a result of the similar isotopic pattern at 382 Da, indicating that a signal that matches the expected molecular weigh is detected. The larger peaks at m/z 366 and 387 do not correspond to common fragment or adduct ions and are not believed to be relate to the compound of interest at m/z 382. The purity score returned for this sample was 0.11 indicating that the majority of the ion signal falls outside the expected isotopic envelope. CONCLUSIONS The correlation of measured and predicted electrospray ionization mass spectra for complex mixtures of compounds produced by combinatorial synthesis has been presented. The following findings have been made: (1) Simple models can be used to predict electrospray ionization spectra for combinatorial libraries. (2) Correlation of the measured and predicted mass spectra can be used to estimate the content and purity of combinatorial mixtures. (3) Automated sample analysis and data processing tools have been developed to reduce to practice the characterization of combinatorial mixtures. (4) The techniques described were shown to be useful for 400-compound combinatorial mixtures as well as single compounds. In practice, FI-MS analysis is often the only method available to rapidly characterize mixture libraries containing large numbers of compounds. Clearly, screening methods such as the FI-MS techniques described here may not be able to uniquely confirm the presence 2950
Analytical Chemistry, Vol. 73, No. 13, July 1, 2001
or absence of individual compounds in complex combinatorial libraries. This fact, however, does not diminish our ability to use combinatorial chemistry for the identification of compounds of interest to the pharmaceutical industry. Biological assays are the primary screens that are used to determine if a compound or combinatorial mixture is interesting. Only uniquely active samples warrant further study. For this reason, characterization data is treated as a secondary screen that is used to improve the chemistry and differentiate two or more mixtures that have similar activities. With these criteria in place, it is important to know what the limitations of the FI-MS approach are. Unlike the combinatorial libraries described above, one can easily imagine groups of neutral nonpolar compounds that would not be suitable for analysis by electrospray ionization. This would lead to mixtures that contain the desired compounds, but return low compound scores because of poor or variable ionization efficiency. To date, fewer than 5% of the combinatorial libraries that we have studied were comprised of compounds that ionized poorly as a result of a lack of basic functional groups. In addition, the analysis of libraries that contain compounds of increasingly large molecular weight would be complicated by the formation of multiply charged species that are not accounted for in the current spectrum prediction model. Both of these circumstances can be determined prior to library synthesis by analyzing the products of prototype reactions by LC/MS. In our laboratories, most of the libraries target small molecules that form predomi-
nantly single charged ions by electrospray. Libraries containing structurally dissimilar compounds could also prove to be problematic for this approach. In practice, many combinatorial mixtures contain compounds with common structural components that are positioned in different orientations to explore slight variations of chemical space. Libraries of this type are particularly well-suited to FI-MS analysis. The primary benefit of this approach has been the ability to provide a quick qualitative view of components present in a combinatorial library. The development of the LibView program has significantly reduced the amount of time required to process and review FI-MS data. It has also provided a practical method of testing different scoring algorithms that further advance our ability to interpret complex combinatorial data sets. Analytical methods
that permit the simultaneous measurement and analysis of large collections of compounds are becoming important tools for affinity selection experiments that are aimed at lead discovery. Additionally, the use of tandem mass spectrometry may provide an avenue for further characterizing individual components of interest. ACKNOWLEDGMENT The authors thank Dr. Richard King (Merck Research Laboratories), Dr. James Stephenson (Oak Ridge National Laboratory), Dr. John R. Yates (The Scripts Research Institute), and Dr. John T. Yates (University of Pittsburgh) for the helpful suggestions and useful discussions regarding MS correlation methods and preparation of this manuscript. AC010021R
Analytical Chemistry, Vol. 73, No. 13, July 1, 2001
2951