Probing Combinatorial Library Diversity by Mass Spectrometry

Andre R. Venter , Kevin A. Douglass , Jacob T. Shelley , Gregg Hasman , Jr. , and Elahe Honarvar ... Magnus Palmblad , Jan W. Drijfhout and André M. ...
0 downloads 0 Views 191KB Size
Anal. Chem. 1997, 69, 2893-2900

Probing Combinatorial Library Diversity by Mass Spectrometry Plamen A. Demirev* and Roman A. Zubarev

Division of Ion Physics, The Ångstro¨ m Laboratory, Uppsala University, Box 534, Uppsala S-751 21, Sweden

The feasibility of a “massively parallel” mass spectrometric method for probing combinatorial library diversity is addressed theoretically for the example of computergenerated mass distributions of combinatorially synthesized peptide libraries containing between two and seven amino acids. We study the behavior of several “global” (integral) parameters of such mass distributionssmass centroid, dispersion, skewness, and kurtosis. The centroid and dispersion are shown to carry information that may characterize the completeness of the synthetic effort. “Local” mass distribution parameters, e.g., “mass density” (number of peptides per mass interval), are also examined. The practical implementation and eventual limitations of such an approach are discussed as well. Synthetic strategies to generate vast arrays of related chemical compounds (“combinatorial libraries”) are based on the repetitive and parallel covalent attachment of a set of individual “chemical building blocks” (for recent reviews, see, e.g., refs 1-9). Combinatorial synthesis techniques in the laboratory have been designed to emulate biological processes whereby a specific immunological response evolves as a result of apparently random genetic recombination and selection mechanisms. The theoretical rationale behind combinatorial synthesis methodologies has been recently summarized.10 The availability of combinatorial libraries containing potentially millions of distinctly new compounds promises to open new avenues and immensely speed up the drug discovery process.3 Combinatorial synthesis techniques are gaining enormous popularity in pharmaceutical laboratories worldwide.3,9 All such techniques are essentially parallel in nature; therefore, they require newer analytical approaches. Novel methods are needed for parallel screening of high-affinity ligands in a vast array of combinatorially synthesized products in order to facilitate new drug discovery in the pharmaceutical industry. There are several analytical challenges posed by this need for rapid handling of such enormous numbers of compounds (either bound to solid support or in a solution, and oftentimes in subpicomolar amounts) and (1) Special issue on combinatorial chemistry; Czarnik, A., Ed. Acc. Chem. Res. 1996, 29 (March). (2) Special issue; Houghton, R., Ed. Pept. Sci. 1995, 37 (No. 3). (3) Intelligent Drug Design; Nature 1996, 384 (No. 6604 supplement). (4) Gallop, M.; Barrett, R.; Dower, W.; Fodor, S.; Gordon, E. J. Med. Chem. 1994, 37, 1233. (5) Gordon, E.; Barrett, R.; Dower, W.; Fodor, S.; Gallop, M. J. Med. Chem. 1994, 37, 1385. (6) Rinnova, M.; Lebl, M. Collect. Czech. Chem. Commun. 1996, 61, 171. (7) Thompson, L.; Ellman, J. Chem. Rev. 1996, 96, 555. (8) Lowe, G. Chem. Soc. Rev. 1995, 309. (9) Balkenhohl, F.; von dem Bussche-Hunnefeld, C.; Lansky, A.; Zechel, C. Angew. Chem., Int. Ed. Engl. 1996, 35, 2288. (10) Kauffman, S. Ber. Bunsenges. Phys. Chem. 1994, 98, 1142. S0003-2700(97)00049-8 CCC: $14.00

© 1997 American Chemical Society

finding among these the “lead compound” (the “needle in a haystack” problem).11 Complementary to the task of screening for substrate-ligand binding specificity among a large number of compounds with similar structures is the extraction of the specific chemical structure (e.g., peptide sequencing) of the lead compounds. Yet another analytical challenge is the confirmation of the diversity of the generated combinatorial library, i.e., the success rate in synthesizing all possible combinations out of the elementary building blocks. The ability to characterize the library complexity (the “high fidelity”, i.e., the completeness of the pool of compounds) is a crucial element in the overall combinatorial synthetic strategy. Traditional single-compound synthetic strategies for drug discovery rely heavily on nuclear magnetic resonance and optical spectroscopy techniques for structure elucidation, which are unsuitable for studies of multicomponent mixtures. Many of the emerging analytical problems in combinatorial synthesis approaches are being successfully addressed with diverse mass spectrometry (MS) techniques.11-20 Evaluation of several MS methodsselectrospray (ES), matrixassisted laser desorption/ionization (MALDI), and secondary ion mass spectrometry (SIMS), coupled to triple-quadrupole or timeof-flight (TOF) analyzers, respectivelysfor the direct analysis of non-peptide bead-bound combinatorial libraries has been performed by Brummel et al.12 Brummel et al.13 have also demonstrated the utility of molecular SIMS in detecting femtomole quantities of tripeptides, attached covalently to polystyrene supporting beads. A protocol has been developed, in which vapors of trifluoroacetic acid rupture the covalent bond, anchoring the peptide, thus increasing multifold the SIMS sensitivity. The applicability of a Fourier transform ion cyclotron resonance (FTICR) MS-based approach in establishing the binding specificity and determining the structure of combinatorially synthesized (11) Burlingame, A.; Boyd, R.; Gaskell, S. Anal. Chem. 1996, 68, 599R (and references therein). (12) Brummel, C.; Vickerman, J.; Carr, S.; Hemling, M.; Roberts, G.; Johnson, W.; Weinstock, J.; Gaitanopoulos, D.; Benkovic, S.; Winograd, N. Anal. Chem. 1996, 68, 237. (13) Brummel, C.; Lee, I.; Zhou, Y.; Benkovic, S.; Winograd, N. Science 1994, 264, 399. (14) Cheng, X.; Chen, R.; Bruce, J.; Schwartz, B.; Anderson, G.; Hofstadler, S.; Gale, D.; Smith, R.; Gao, J.; Sigal, G.; Mammen, M.; Whitesides, G. J. Am. Chem. Soc. 1995, 117, 8859. (15) Bruce, J.; Anderson, G.; Chen, R.; Cheng, X.; Gale, D.; Hofstadler, S.; Schwartz, B.; Smith, R. Rapid Commun. Mass Spectrom. 1995, 9, 644. (16) Youngquist, R.; Fuentes, G.; Lacey, M.; Keough, T. J. Am. Chem. Soc. 1995, 117, 3900. (17) Metzger, J.; Kempter, C.; Weismuller, K.; Jung, G. Anal. Biochem. 1994, 219, 261. (18) Dunayevski, Y.; Vouros, P.; Wintner, E.; Shipps, G.; Carell, T.; Rebek, J. Proc. Natl. Acad. Sci. U.S.A. 1996, 93, 6152. (19) Dunayevski, Y.; Vouros, P.; Carell, T.; Wintner, E.; Rebek, J. Anal. Chem. 1995, 67, 2906. (20) Nawrocki, J.; Wigger, M.; Watson, C.; Hayes, T.; Senko, M.; Brenner, S.; Eyler, J. Rapid Commun. Mass Spectrom. 1996, 10, 1860.

Analytical Chemistry, Vol. 69, No. 15, August 1, 1997 2893

inhibitors to carbonic anhydraze has been addressed.14 A general strategy for probing binding specificity in combinatorial synthesis has also been outlined.15 Such a general strategy avoids the requirements of, e.g., solid support media and the needs for distinct separation and purification steps. The utility of MS methods for appropriate “tagging”, i.e., addressing each step along the combinatorial synthesis pathway, has been investigated.16 Metzger et al.17 have determined the composition and purity of synthetic multicomponent peptide mixtures by ES and tandem mass spectrometry. They have found out that the ion intensity distributions of the molecular ions reflected the amount of various peptides in the mixture. Dunayevski et al.18,19 have devised methods to demonstrate the diversity of the combinatorial synthesis reaction products by either direct enumeration or monitoring sublibraries along the synthetic route. Very recently, Nawrocki et al.20 have reported the use of ES FTICR MS to directly assess the diversity and the degeneracy of peptide libraries containing between 103 and 104 components. In this paper, we address theoretically the feasibility of a direct mass spectrometric method for probing the diversity and complexity of combinatorial libraries. This “massively parallel” approach is illustrated for the example of computer-generated mass distributions of combinatorially synthesized peptide libraries containing between two and seven amino acids. While non-peptide “small molecule” combinatorial library synthesis is more important from the pharmaceutical point of view,9 compared to peptide combinatorial libraries, we have chosen the latter for both “historical” and practical reasons. First, the overall combinatorial synthesis methodology has been a spin-off from solid state peptide synthesis approaches developed in the 1980s.21,22 Second, the main points of the investigated approach are best and most easily illustrated for the example of peptide libraries without restricting its generality by this particular choice. We study the behavior of several experimentally determined “global” (integral) parameters of peptide library mass distributionssthe mass centroid (average value), the dispersion (the average of data scattering around the centroid and correlated to the full width at half-maximum, fwhm, of the distribution), the skewness (correlated to the asymmetry of the distribution), and the kurtosis (correlated to the “flatness” of the distribution and indicative of the “concentration” around the centroid). It is shown that these parameters carry important information that may characterize the completeness of the synthetic effort. Both library incompleteness and the presence of unwanted byproducts (e.g., peptides with shorter length than the desired) as well as MS instrumental effects have been simulated in order to assess the usefulness of such an approach. We further examine “local” parameters characterizing the mass distributions, e.g., the “mass density” (number of peptides per mass interval) on the combinatorial library “landscape”. The question of peptide mass multiplicity, i.e., number of peptides with the same amino acid composition but different sequences, is also addressed. Schemes for practical implementation and eventual problems and limitations of such a MS approach are discussed as well. METHODS The computer programs used to generate the complete list of linear peptides in a library of peptides with a specified length were (21) Houghten, R. Proc. Natl. Acad. Sci. U.S.A. 1985, 82, 5131. (22) Geysen, H.; Meloen, R.; Barteling, S. Proc. Natl. Acad. Sci. U.S.A. 1984, 81, 3998.

2894

Analytical Chemistry, Vol. 69, No. 15, August 1, 1997

modifications of previously developed software.23,24 Similar software has been described by Mann as well in the context of studies on the distribution of accurate peptide masses.25 The code was written in Think C v.4.0 language (Symantec Inc., Cupertino, CA) and run on a Macintosh (Apple Computers Inc., Cupertino, CA) series of personal computers. The residues of the 20 common amino acids (AAs) were chosen as elementary “building blocks” in the combinatorial synthesis; the terminal N- and C- groups were not included. We note that the results reported here are easily expanded to libraries containing peptides with various terminal (including blocking) groups and/or a specific “scaffold” (e.g., a constant repeat unit).9,20 In both cases, the (monoisotopic) masses of the N- and C-terminal groups or the scaffold should be added to the respective centroid distribution values (i.e., in the case of nonmodified terminal groups, the masses of H and OH should be added). All possible AA combinations for a linear peptide with a given length N (in this case, between two and seven amino acids) were sequentially generated. This was achieved by an exhaustive enumeration approach simulating “multiple-step” combinatorial synthesis.9 To this end, a “counter” was organized with as many registers as respective residues in the peptide (N ) 2...7). Every register permuted AA residues independently; in total 20N permutations were performed, corresponding to all N-peptides with different sequences. For every permutation, the monoisotopic peptide molecular mass was calculated, and the corresponding channel in the mass distribution was incremented. The file containing the peptide library mass data, used for plotting the distributions, was also updated after each peptide-generating step. Plotting of the peptide mass distributions was done with 0.1 Da accuracy. The sums, entering various moments of the mass distributions, were calculated after each peptide-generating step. The centroid Mav of the distribution could be calculated a priori (Mav ) NM′av, where M′av is the average mass of all AAs used for the simulations, e.g., M′av ) 118.806 Da for the 20 common AAs). The following formula was used to calculate the ith moment µi of the distribution:

µi ) (A-1)

∑(M

p

- Mav)i

where A is the total number of different peptides of length N, Mp is the mass of each peptide, and the sum ∑ is over all peptides from 1 to A. The number of different N-peptides is obviously equal to A ) kN, where k is the number of different AAs in the pool. The dispersion D, the skewness S, and the kurtosis K of each peptide mass distribution were calculated from the various moments according to the following formulas:

D ) (µ2)1/2 S ) µ3(µ2)-3/2

(23) Zubarev, R.; Demirev, P.; Ha˚kansson, P.; Sundqvist, B. U. R. Anal. Chem. 1995, 67, 3793. (24) Zubarev, R.; Ha˚kansson, P.; Sundqvist, B. U. R. Anal. Chem. 1996, 68, 4060. (25) Mann, M. Proceedings of the 43rd ASMS Conference on Mass Spectrometry and Allied Topics, Atlanta, GA, May 21-26, 1995; p 639.

K ) µ4(µ2)-2 - 3

(a1 + a2 + ... + ak)N )

∑[N!/(n !)...(n !)](a )

n

1

Only the monoisotopic mass (i.e., the 12C isotope peak26) of each peptide was included in the library mass distribution. The conversion from monoisotopic to either nominal or average isotopic peptide mass is straightforward. For instance, both the centroid and dispersion data for the monoisotopic peptide mass distributions should be multiplied by 1.000 6737 in order to obtain the respective average isotopic mass values when considering all 20 AAs. The respective multiplication factor is obtained from the quotient of, e.g., the average isotopic and monoisotopic masses of each building block set. Disulfide-bridged as well as cyclic peptides are also excluded in the current simulation. We also note that for, e.g., comparison with experimental results, where protonated molecular ions are detected, the proton mass has to be added to Mav and D. Unit coefficients for calculating the moments (and from there the centroid and the dispersion) were used; i.e., it was assumed that all individual peptides were present in the library in equimolar amounts. This resulted in mass “degeneracy” (termed “redundancy” in ref 20), i.e., presence of peptides with the same amino acid composition (and therefore the same mass) but different sequences. For example, only one tripeptide with composition Ala3 was counted, while three peptides with composition Ala2Gly were included in the mass distribution. For the same reason, although Ile or Leu would result in differing peptide composition and sequence combinations, they would be indistinguishable by mass and would give mass degeneracy. For many practical purposes, the same is valid for the pair Gln/Lys (monoisotopic masses 128.058 58 and 128.094 96 Da, respectively). The degree of mass degeneracy and the peptide multiplicity (i.e., number of peptides with the same amino acid composition but different sequences) will be discussed further. To simulate incomplete and imperfect combinatorial synthesis (e.g., the presence of unwanted byproducts), only 19 out of the 20 AA residues were used in some of the computer runs to generate peptides with specified lengths, and mass distributions of random mixtures containing peptides with different lengths were also investigated (see below).

k

1

n n 1(a2) 2...(ak) k

(1)

where k is the number of AA building blocks in the pool, N is the peptide length, and ni (i ) 1, 2, ..., k, ∑ni ) N) is the number of AAs of type ai in the respective N-peptide AA composition: (a1)n1(a2)n2...(ak)nk. The derivation of formula (1), as well as formula (2) below, can be found in any standard textbook on combinatorics (see, e.g. ref 27). For instance, for the above case of the Ala2Gly tripeptide, N ) 3, n1 ) 2, n2 ) 1, and the respective muliplicity M ) 3!/(2!)(1!) ) 3. The number of different terms in the polynomial expansion (1) would obviously correspond to the number of different mass peaks in the distribution, i.e., the number of different AA compositions (overall multiplicity M). It is given by (the number of ways to select N objects with repetition from k types of objects)27

M)

(

)

N+k-1 ) (k + N - 1)!/N!(k - 1)! N

(2)

Multiplicity of Peptide Sequences for Peptides with the Same Amino Acid Composition. In general, only the elemental and, in some cases, the amino acid composition (up to the indistinguishable Ile/Leu residues) of a peptide can be confirmed by accurate mass measurement of its intact molecular ions.24 The presence of peptides with different sequences having the same amino acid composition and, therefore, the same mass gives rise to a degeneracy when the overall mass distribution of peptides in a combinatorial library is studied. The multiplicity M, i.e., the number of N-peptides with a particular amino acid composition but different sequences, is given by the corresponding term in the polynomial expansion:

The corresponding M values for di- to heptapeptides are listed in Table 1. It is clear that, while 207 () 128 × 107) different linear heptapeptide sequences are possible, the number of possible different AA compositions is hugely decreased to 657 800 (considering Ile and Leu as different residues). From the mass spectrometric point of view, Ile/Leu and, for practical purposes, even Gln/Lys are indistinguishable in mass, which will lead to an even smaller number of possible AA compositions discernible by MS (Table 1). The existence of such a high degeneracy for peptides with the same AA composition suggests that caution should be exercized in implementing peptide-mapping approaches for protein identification by MS. A strategy that enhances the MS peptide mapping specificity is peptide tagging, where not only peptide molecular weights but also short specific sequences are used.28 Global Mass Distribution Parameters. The mass distributions of two complete combinatorial libraries, containing all triand hexapeptides, respectively, are given in Figure 1. The main parameters describing the mass distributions of the computergenerated peptide libraries for peptides up to seven AAs in length are given in Table 1. Few general features of these distributions are readily apparent: the larger the peptide library, the more symmetric the form of the distributions becomes (the skewness tends to approach monotonously zero with the increase in peptide length). The same is also true for the “peakedness” of the distributionssthe kurtosis also tends to approach zero for higher peptide lengths. The closer the peptide mass distribution to a symmetric (Gaussian), the more direct the link between the dispersion (D) and the fwhm (that can be easily determined experimentally) would be. The increment in D as a function of the number of AA residues in a peptide tends to decrease with an increase in the peptide length (Table 1). A remarkable feature of the global distributions for all libraries starting from the tripeptide is also noticed: there exists “signal modulation” with a “period” of ∼14 Da starting from the lower mass “wing” to around the apex (Figure 1). The occurrence of such modulation is most

(26) Yergey, J.; Heller, D.; Hansen, G.; Cotter, R.; Fenselau, C. Anal. Chem. 1983, 55, 353.

(27) Constantine, G. Combinatorial Theory and Statistical Design; J. Wiley & Sons: New York, 1987. (28) Mann, M.; Wilm, M. Anal. Chem. 1994, 66, 4390.

RESULTS AND DISCUSSION

Analytical Chemistry, Vol. 69, No. 15, August 1, 1997

2895

Table 1. Mass Distribution Parameters of the Computer-Generated Complete Combinatorial Peptide Libraries Containing the 20 Common Amino Acids for Up to Seven Amino Acids in Length peptide length (AA)

no. of different peptide sequences

no. of different amino acid compositions

no. of peptides with a different massb

Mmin (Da)

Mmax (Da)

Mav (Da)

D (Da)

S

K

1 2 3 4 5 6 7

2 × 101 4 × 102 8 × 103 16 × 104 32 × 105 64 × 106 128 × 107

20 210 1 540 8 855 42 504 177 110 657 800

19 (18) 190 (181) 1 330 (1 140) 7 315 (5 985) 33 649 (26 334) 134 596 (100 947) 480 700 (346 104)

57.021 114.042 171.064 228.086 285.107 342.129 399.150

186.079 372.159 558.238 744.317 930.397 1 116.476 1 302.555

118.806 237.611 356.417 475.223 594.029 712.834 831.640

30.060 42.512 52.066 60.121 67.217 73.63 79.532

-0.131 -0.093 -0.076 -0.066 -0.059 -0.054 -0.050

-0.037 -0.018 -0.012 -0.009 -0.007 -0.006 -0.005

a D, S, and K are the dispersion, the skewness, and the kurtosis of each distribution, respectively. b From the mass spectrometric point of view, Ile/Leu (and, for practical purposes, even Gln/Lys) are indistinguishable in mass, which results in a higher mass degeneracy.

Figure 1. Mass distributions of two complete combinatorial libraries containing all tri- (a) and hexapeptides (b), respectively.

probably connected to the fact that the masses of almost all AAs lighter than the average AA mass are 14 Da apart from each other (due most often to CH2 chemical homology among the various amino acids, but also fortuitously). To illustrate the variation of the global parameterssMav, D, S, and Kswith the library contents, two incomplete tetrapeptide model libraries are compared to the complete one (Figure 2). These libraries are generated by altogether excluding from the building block set a specified AAsVal or Glu. They each contain 130 321 peptides (i.e., roughly 30 000 peptides less than the complete tetrapeptide library), and the corresponding mass degeneracy leads to 4845 different masses (for indistinguishable Ile/Leu and Gln/Lys pairs). One of the missing AAs is lighter than the average AA mass (118.806 Da) for the 20 common AA building block set, while the other is heavier. As expected, this is reflected in the centroids of the respective library mass distributions. The centroids of the des-Val (Mav ) 479.378 Da) and the des-Glu (Mav ) 473.068 Da) tetrapeptide mass distributions are shifted by a few daltons to higher or lower mass values, compared to the Mav (475.223 Da) of the complete tetrapeptide library distribution. 2896 Analytical Chemistry, Vol. 69, No. 15, August 1, 1997

Figure 2. Mass distributions of two incomplete tetrapeptide libraries (des-Glu, b, and des-Val, c, i.e., each of these amino acids being excluded from the “building block” set) and the complete tetrapeptide library (a).

A comparison of the global parameterssMav, D, S, and Ksfor the complete and the 20 incomplete tetrapeptide libraries (obtained

Table 2. Mass Distribution Parameters of the 20 Incomplete Tetrapeptide Libraries (des-Xxx, i.e., One Amino Acid Being Excluded from the “Building Block” Set) and the Complete Library library

Mav (Da)

D (Da)

S

K

completea des-Glyb des-Ala des-Ser des-Pro des-Val des-Thr des-Cys des-Leu(Ile) des-Asn des-Asp des-Gln des-Lys des-Glu des-Met des-His des-Phe des-Arg des-Tyr des-Trpc

475.223 488.230 485.279 481.912 479.802 479.378 478.961 478.548 476.417 476.226 476.018 473.275 473.267 473.068 472.647 471.380 469.273 467.371 465.905 461.060

60.121 54.395 57.438 59.842 60.827 60.979 61.114 61.233 61.624 61.642 61.657 61.529 61.527 61.494 61.413 61.081 60.231 59.131 58.058 52.932

-0.066 -0.221 -0.069 -0.018 -0.020 -0.022 -0.025 -0.028 -0.050 -0.052 -0.054 -0.087 -0.087 -0.090 -0.095 -0.107 -0.121 -0.119 -0.104 +0.129

-0.009 -0.011 +0.062 +0.021 -0.013 -0.019 -0.024 -0.029 -0.045 -0.045 -0.046 -0.037 -0.037 -0.035 -0.031 -0.015 +0.022 +0.059 +0.083 -0.063

a The extreme values of all peptide library mass distributions do not change, except when excluding the lightest (des-Gly) or the heaviest (des-Trp) AA (Mmin ) 228.086 Da and Mmax ) 744.317 Da). b M c min ) 284.148 Da. Mmax ) 652.253 Da.

by excluding in each case only one AA from the building block set) is presented (Table 2). In general, the shift ∆M in Mav between the respective peptide distributions corresponds to the difference between M′av of the complete set of 20 AAs and the corresponding M′′av of the reduced AA set (e.g., the 19 AAs in the above example), multiplied by the peptide length, N:

∆M ) N[(Mp - M′av)/19] where Mp is the mass of the AA excluded from the building block set. It is evident that an accurate estimate of the mass distribution centroid (Mav) to within 1 Da (for the least favorable case), at least in principle, can differentiate between two different tetrapeptide librariessthe complete and an incomplete (des-Asp) one. Of course, the practical difficulties for implementing experimentally such a “massively parallel” approach for a mixture that may contain around 105 components seem to be insurmountable at the present stage. The D, S, and K values in these cases do not show significant variations (Table 2). For instance, D has minimal values in the two limiting cases (des-Gly or des-Trp) when excluding the lightest or the heaviest AA. On the other hand, distinguishing between a tetrapeptide combinatorial library in which only three AAs are randomly varied (e.g., Asp-Xxx-XxxXxx, and thus corresponding to a tripeptide library with only Mav shifted accordingly) and the complete tetrapeptide library (XxxXxx-Xxx-Xxx) could also be achieved, in principle, based on MS determination of the dispersions in the two cases (Table 1). A comparison of the global mass distribution parameterssMav, D, S, and Ksfor the complete and two incomplete (des-Ala and desGlu) libraries containing between three and six peptides is also presented (Table 3). For a combinatorial library containing a mixture of peptides with different lengths, the centroid will be a function of the centroids of the deconvoluted “homo”peptide libraries. For instance, a mixture of di- and tripeptide libraries will result in

Table 3. Mass Distribution Parameters of the Two Incomplete Libraries (des-Ala and des-Glu, i.e., Each of These Amino Acids Being Excluded from the “Building Block” Set) and the Complete Library for Tri- to Hexapeptides peptide length (AA)

library

Mav (Da)

D (Da)

S

K

3 3 3 4 4 4 5 5 5 6 6 6

complete des-Ala des-Glu complete des-Ala des-Glu complete des-Ala des-Glu complete des-Ala des-Glu

356.417 363.960 354.801 475.223 485.279 473.068 594.029 606.599 591.335 712.834 727.919 709.602

52.066 49.742 53.255 60.121 57.438 61.494 67.217 64.217 68.752 73.633 70.346 75.315

-0.076 -0.079 -0.104 -0.066 -0.069 -0.090 -0.059 -0.061 -0.080 -0.054 -0.056 -0.073

-0.012 +0.082 -0.047 -0.009 +0.062 -0.035 -0.007 +0.049 -0.028 -0.006 +0.041 -0.023

overlapping of the distribution and a shift of the average value. The new average mass (M2+3) can be calculated from

M2+3 ) (A2M2 + A3M3)/(A2 + A3) where A2 and A3 are relative quantities of the di- and tripeptide libraries, respectively, and M2 and M3 are their average masses. Random “loss” of peptides from the libraries (due to imperfect synthesis and/or MS instrument discrimination) will cause a small random shift in Mav. This shift will become significant only at a relatively large “loss” of peptidessmore than a few percent of the total library content. The systematic dependence of the global mass distribution parameters on the number and type of randomly lost peptides (modeled by a Monte Carlo simulation) will be the subject of a separate study. We have also simulated effects of uniform truncation of the mass distribution on the mass centroid. Height cutoff (e.g., due to limited instrumental dynamic range) will cause a ∆M shift from the Mav value of the distribution because of its asymmetry, in analogy to a similar effect when determining the average mass of large molecules.23 For the six-peptide library, the following empirical formula has been derived:

∆M ≈ -0.33h1/2 where ∆M is the shift in daltons and h is the cutoff height in percent of the total height. We have modeled several incomplete peptide libraries, containing each a total of ∼1000 peptides, but having different peptide lengths (Figure 3). A tripeptide library has been created by using only 10 amino acids as building blocks (Ala, Arg, Asp, Cys, Glu, Gly, His, Leu, Lys, Met). This library contains exactly 103 different peptides. Accordingly, a tetrapeptide, a pentapeptide, and a sexapeptide library (using the first six (Ala, Arg, Asp, Cys, Glu, Gly), four (Ala, Arg, Asp, Cys), or three (Ala, Arg, Asp) amino acids as building blocks, respectively) have been modeled. The respective mass distribution parameters of these incomplete libraries are listed in Table 4. Local Mass Distribution Parameters. “Local” parameters of peptide mass distributions, e.g., their density per mass interval, have already been examined.25 Mann has found out the existence of “forbidden zones” due to the fact that the monoistopic mass offset for various peptides at a specific nominal mass lies within Analytical Chemistry, Vol. 69, No. 15, August 1, 1997

2897

Figure 3. Mass distributions of three incomplete libraries, containing approximately the same numbers∼1000sof peptides: (a) a tripeptide library (10 amino acids as building block setsAla, Arg, Asp, Cys, Glu, Gly, His, Leu, Lys, Met); (b) a tetrapeptide library (six amino acids as building blockssAla, Arg, Asp, Cys, Glu, Gly); (c) a hexapeptide library (three amino acids as building blockssAla, Arg, Asp).

a narrow range. For instance, this local distribution is less than 0.25 Da wide at mass 510, giving rise to a 0.75 Da forbidden zone.25 We have plotted (Figures 4-6) expanded “slices” of the peptide mass distributions of the complete tertrapeptide library and two incomplete ones (des-Glu and des-Val) in the intervals between 300 and 320, 440 and 460, and 600 and 620 Da (at the left wing, the apex, and the right wing of each distribution, respectively). The existence of forbidden zones with roughly the same width along the whole distribution is apparent. Also apparent is the 14 Da signal modulation effect in the lower mass part of each distribution (see above). Although the local parameter pattern is preserved in all three mass distributions, there is a variation in the relative intensity in the individual peaks when comparing the separate cuts. It is more marked in the lower mass range (300320 Da) for the complete and the des-Val library (since a lighter than average AA is excluded) (Figures 4 and 6). Conversely, the pattern variations between local parameters of the complete and the des-Glu libraries are better discerned at the apex of the distribution and in the higher mass range (Figures 4 and 5). MS library search algorithms can be used to estimate variations in experimental peak intensities (compared to a computer-generated 2898 Analytical Chemistry, Vol. 69, No. 15, August 1, 1997

one), provided the experimental peak intensities can be directly correlated to the quantities of different peptides present (ref 17 and see below). Comparisons of local library parameters (cuts) for different libraries may also supply information on the library diversity. We note that the extreme values (Mmin and Mmax) of all peptide library mass distributions do not change except when excluding the lightest (des-Gly) or the heaviest (des-Trp) AA from the building block sets (Table 2). Obviously, the mass density of peptide libraries containing the same total number of peptides but with different lengths decreases accordingly (Figure 3). Problems and Pitfalls in Eventual Practical Implementation. The direct introduction in a mass spectrometer of a mixture containing many tens of thousands of compounds and the detection of all individual components in direct relation to their concentration in the mixture seem insurmountable problems at the present time. There are a number of instrumental factors that contribute to the difficulties in a practical implementation of the massively parallel approach. These factors are well-known and, in general, are connected to the applicability of mass spectrometry for quantitative mixture analysis. Among them are ionization and detection efficiencies and discrimination effects, dynamic range, mass resolution and accuracy, adsorption to instrument surfaces, presence of other impurities, etc. We do not intend to discuss each of these at length and the immense complexity of the problem considering the cumulative effect of all such factors. We only note that a computer simulation of the effect of each individual factor can be accomplished. For instance, multiple charging in ES, although rare for peptides with mass below 1000 Da, depends on the number of basic residues in the peptide. This can easily be correlated in the computer code to a preselected decrease in the quantity of, e.g., peptides containing two Arg residues in the mass distribution for singly charged ions. On the other hand, mass spectrometry has been used successfully for characterization of directly introduced mixtures containing up to several hundred individual compounds. The overall appearance of a mass spectrum of such a mixture has been used as a “fingerprint” to distinguish between different mixtures. Recently, Guan et al. demonstrated the applicability of highresolution FTICR MS for molecular formula identification of more than 350 individual components in a directly introduced fraction of a gas oil.29 ES and tandem MS have been used to assess the composition and purity of a library of 47 synthetic peptides.17 The intenisties of the individual peaks have matched the quantity of isobaric peptides in the library. Capillary electrophoresis ES MS/ MS has been used to directly enumerate 160 of the components of a library containing theoretically 171 bisubstituted xanthane derivatives.18 It is clear that such direct enumeration approaches cannot be implemented in practice for libraries containing more than a few hundred compounds. Instead of addressing the large array at the end of the synthesis, probing of representative sublibraries by, e.g., positive and negative ion ES MS at different steps along the synthetic route can yield useful data for optimizing the process and assuring the desired complexity of the final combinatorial library, as illustrated for the example of a potential trypsin inhibitor library containing 65 341 individual compounds.19 Electrospray FTICR MS in the broad-band detection mode has provided a quick and unambiguous method demonstrating a fault in the synthesis of a peptide library thought to be Gly-Tyr-XxxXxx-Xxx-Cys (Xxx being any of the 18 natural AAs, with the (29) Guan, S.; Marshall, A.; Sheppele, S. Anal. Chem. 1996, 68, 46.

Table 4. Mass Distribution Parameters of Four Incomplete Libraries (Each Containing around 1000 Peptides with a Specified Length) library

no. of peptides

Mmin (Da)

Mmax (Da)

Mav (Da)

D (Da)

S (Da)

K (Da)

tripeptidea tetrapeptideb pentapeptidec hexapeptided

1000 1296 1024 729

171.1 228.1 355.2 426.2

468.3 624.4 780.5 936.6

342.155 420.826 556.468 684.330

49.7 67.1 68.1 85.08

+0.41 +0.01 -0.10 +0.02

-0.14 -0.30 -0.21 -0.25

a Building blocks: Ala, Arg, Asp, Cys, Glu, Gly, His, Leu, Lys, Met. b Building blocks: Ala, Arg, Asp, Cys, Glu, Gly. c Building blocks: Ala, Arg, Asp, Cys. d Building blocks: Ala, Arg, Asp.

Figure 4. Expansions of the peptide mass distributions in the complete tertrapeptide library in the intervals between 300 and 320, 440 and 460, and 600 and 620 Da (at the left wing, a, the apex, b, and the right wing, c, of the distribution, respectively).

Figure 5. Expansions of the peptide mass distributions in the incomplete (des-Glu) tetrapeptide library in the intervals between 300 and 320, 440 and 460, and 600 and 620 Da.

exception of Cys and Trp).20 The mixturestotal sample amount of 0.2 mg, containing in theory 5832 different peptide sequences but only 969 different mass peaks (816 if the Gln/Lys mass difference is not discerned)shas been introduced directly via the ES ion source. Direct determination of the mixture’s average

mass has shown a gross disagreement between expected and observed values (indicating that the two N-terminal amino acids, Gly-Tyr, were not coupled at all during the synthesis). Another important feature for a successful implementation of the abovediscussed approach, namely the requirement for computing the Analytical Chemistry, Vol. 69, No. 15, August 1, 1997

2899

We finally note a recent study by Zhao et al.31 In it, a mathematical model has been devised to study the statistics of a particular combinatorial synthesis strategysthe “split/recombine” methodswith the aim to optimize the amount of starting material and solid support (beads). It is demonstrated that deviations from the final “ideal” equimolar distribution are described in terms of Pearson statistics. Another criterion has also been derived. It links the number of beads needed so that all “individual relative errors” should fall below a predetermined tolerance limit within an, e.g., 99% confidence interval.31 The applicability of such a general statistical approach for the specific problem of MS sampling a library pool in order to assess its diversity remains to be probed.

Figure 6. Expansions of the peptide mass distributions in the incomplete (des-Val) tetrapeptide library in the intervals between 300 and 320, 440 and 460, and 600 and 620 Da.

mass distribution parameters (both global and local) in conjunction with their experimental MS determination, is also illustrated in ref 20. A somewhat similar problemsquantitative analysis of polymer mixtures, e.g., direct estimation of their average molecular weights and polydispersitiesshas been addressed by MALDI TOF mass spectrometry and off-line data processing.30 A discussion of several factors, limiting the applicability of MALDI in quantitative studies of polymer mixtures, like ion yield, detector efficiency, and adduct formation, is also presented there. We point out that all these factors are essentially common to the implementation of the “massively parallel” approach for probing combinatorial library diversity, discussed here. (30) Montaudo, G.; Scamporrino, E.; Vitalini, D.; Mineo, P. Rapid Commun. Mass Spectrom. 1996, 10, 1551. (31) Zhao, P.; Zambias, R.; Bolognese, J.; Boulton, D.; Chapman, K. Proc. Natl. Acad. Sci. U.S.A. 1995, 92, 10212.

2900

Analytical Chemistry, Vol. 69, No. 15, August 1, 1997

CONCLUSION Mass distributions of linear peptide libraries for peptide lengths between two and seven amino acids have been generated by computer simulation. Peptide mass multiplicity, i.e., number of peptides with the same amino acid composition but different sequences, has been discussed. The feasibility of a “massively parallel” mass spectral approach for probing the combinatorial library diversity has been probed. It is based on determining global (integral) as well as local characteristics of the overall mass distributions of all individual library components. Its practical implementation seems feasible for libraries containing from several hundred up to several thousand individual components. Such an approach can also be useful for MS monitoring the diversity of reaction products (e.g., “cuts” containing around 103 individual members) at each subsequent step along the combinatorial synthetic pathway. Appropriate statistical criteria still have to be formulated in order to prescribe the adequate sampling rate resulting in a predetermined tolerance limit within a given confidence interval. It is obvious that the proposed approach is not limited to linear peptide libraries containing only amino acids as building blocks. It can be generalized to other types of combinatorial synthetic products. Moreover, the combinatorics code described here may be employed to generate different “virtual” combinatorial libraries, mimicking various imperfections in the synthetic steps, and thus examine various statistical parameters of the libraries that may be of practical importance in optimizing the synthetic strategies. Future studies will be aimed at extending and experimentally implementing the above-described approach on the example a specific small-molecule non-peptide library.

ACKNOWLEDGMENT Partial financial support for this work has been provided by the Swedish Natural Sciences Research Council (NFR) and the K. & A. Wallenberg and the G. Gustaffson foundations.

Received for review January 14, 1997. Accepted May 14, 1997.X AC970049W X

Abstract published in Advance ACS Abstracts, July 1, 1997.