Quantitative evaluation of library searching ... - ACS Publications

for the assessment of library searching performance. The method may be applied to any combination of spectral pro- cessing alternative andsearch metri...
0 downloads 0 Views 699KB Size
Anal. Chern. 1983, 55, 1925-1929

1925

Quantitative Evaluation of Library Searching Performance Michael F. Delaney,* F. Vincent Warren, Jr.,I and John R. Hallowell, Jr. Department of Chemistry, Boston University, Boston, Massachusetts 02215

A procedure Is descrlbed for the quantltatlve evaluatlon of the performance of a llbrary searchlng system employed for molecular structure elucldatlon. The technlque allows the evaluatlon of any form of spectral representation or any spectral comparison metrlc. The approach Is applicable to any type of spectrometry used for llbrary searchlng (LS) or to comblnatlons of different types of spectra. All comparisons are made relative to a chosen standard of quallty for library searching performance. I t Is assumed that the results of searchlng a llbrary of full-Intensity, full-resolution spectra by a least-squares metrlc will provlde an acceptable standard for comparlson. Results are presented to demonstrate the utlllty of thls procedure for the examlnatlon of alternatlves for llbrary searching of vapor-phase Infrared spectra.

of options for library searching. Kwiatkowski and Riepe have proposed a “quality index” for LS systems (12) but their method requires the tabulation of substructures common to the unknown compoiund and each hit-list member. This is not a trivial matter, and the need for a simpler evaluation procedure remains. In this paper, we present a simple and quantitative method for the assessment of library searching performance. The method may be applied to any combination of spectral processing alternative and search metric. Rather than attempt to define what is meant by quality for LS systems, we compare any given LS system to a selected standard of quality. This approach increases the flexibility of the method, since any appropriate standard of quality may be chosen according to the goals of a particular LS system. The specific standard used for this work would not necessarily be optimal for all systems.

Library searching (LS) has been used effectively for both the identification and interpretation of spectra. The continued interest in instruments such as the gas chormatography/ Fourier transform infrarled spectrometer (GC/FTIR) underscores the need for an objective approach to the development of versatile and efficient LS systems. The tremendous quantity of data which can be generated in a single GC/FTIR experiment necessitates computer-assisted data processing. Consequently, 1,ssystems are included in most commercial GC/FTIR instruments. Two important choices must be made in the design of a library searching system: (1)the manner in which the spectra are to be processed prior to storage in the reference library, and (2) the comparison >metricto be used during the search. A diversity of options for each of these has appeared in the literature (1-9) but it is not clear what combination of options would provide the best L S system for a given application. As a result, the development of LS systems is typically approached empirically. Various possibilities are tried until “satisfactory” porformance is obtained. Commercial FTIR search systems reflect this problem, in that they generally offer a number of options for spectral processing and choice of search metric, with little guidance as to how to design an optimal searching system. The need for a quantitative evaluation of library searching systems has received some attention in the literature. LS systems designed for th,e identification of spectra are most easily treated. Here the goal is unambiguous. The search system succeeds if it gets the right answer. Quality for LS systems of this sort has been expressed as the percentage of correctly identified spectra (10). For interpretive LS systems, the problem is more complex. These systems are designed to give useful information for spectal interpretation even when the unknown spectrum is not present in the refer~encelibrary. The entire “hit-list” of closest matches thus becomes important in determining quality. The most cominon approach to assessing quality in such systems has been by subjective visual inspection of hit-lists (11). Such “evaluation by acclamation” is too qualitative to provide a sound basis for the objective comparison

THEORY Definition of a Spectrum and Comparison Metric. A spectrum can be treated as a d-dimensional vector

Present address: Waters Associates, Milford, MA 01757.

x := ( X I , x 2 , ..., x,, ..., X d ) (1) where each xi is the intensity in spectral (wavelength) channel i. In this case there are “d” abscissa resolution elements. We can then envision each spectrum to be a point in d-dimensional space. That is, there are “d” orthogonal coordinate axes, each one corresponding to a particular wavelength channel. The dissimilarity between two spectra, i and j , can be taken to be the Euclidean geometric distance, DLJ,between the two spectral points in d space where the summation is over each of the “d” wavelength channels. This definition of spectral dissimilarity, which is based on a strict geometric foundation, is referred to in the literature as the least-squares comparison metric. Compressed Spectral Representations. Due to the major storage requirements of a large library, spectra are usually preprocessed into a smaller representation. The degree of compression ranges from reduction of intensity resolution to Fourier transformation. For the purposes of this analysis it suffices to say that the reduced representation will also be in the form of a vector, probably with fewer than d dimensions. In general, the unknown and library reference spectra are preprocessed to have the same representation. Assessment of dissimilarity between unknown and library spectra is still performed numerically by using a comparison metric whiich operates on corresponding pairs of spectral channels. In some previous studies (13),spectral intensities have been encoded only to one bit. That is, each wavelength chaninel is assigned an intensity of “1”for peak present, or “0” for peak absent. This is an exceedingly compact representation, since spectra can be “bit-packed” with eight channels per byte of storage. Spectra processed and stored in this manner are referred to as binary spectra. Definition of a (standard for Library Searching Performance. The proposed evaluation process allows any spectral representation and comparison metric to be selected

0003-2700/83/0355-1925$01.50/00 1983 American Chemlcal Soclety

1026

ANALYTICAL CHEMISTRY, VOL. 55, NO. 12, OCTOBER 1983

as the standard against which other alternatives can be quantitatively compared. The most obvious standard for assessing LS performance would seem to be full-intensity, full-resolution spectra, as these would contain the maximum amount of spectral information. In the absence of information by which the most important spectral abscissa (wavelength) channels could be selected, a suitable comparison metric to use as a standard is the Euclidean distance measure of dissimilarity based on least squares (eq 2). Any other metrics could be used as the standard. However, least-squares matching has a theoretical foundation and is also provided in many commercial LS systems. The evaluation technique to be presented quantitatively compares the LS performance for a compressed library to the performance for the full spectra by using a set of test spectra, which need not be members of the reference library. The approach measures how similar the compressed spectra hitlists are to the full spectra hit-lists. The hit-lists can be compared without regard to the form of the spectra or the comparison metric used, which lends flexibility to this approach. The Evaluation Process. The full resolution standard library consists of N spectra which are d dimensional. A test set is selected which consists of T spectra. Test compounds should be selected to be representative of the library. No generally regarded method for optimally selecting test set members is known. Random selection of a test set can be used when a sufficiently large library of test compounds is chosen. The option of testing each library compound against the rest of the library is not convenient in the case of a large library. Each of the T spectra is searched through the N spectra reference library, using least squares matching, to yield hit-lists of the M best matching spectra. Usually the 10 most similar spectra are recorded ( M = lo), although any value of M can be used. This results in an array of T standard search lists, each of which contains M members. The compressed library under consideration consists of N spectra which are d’dimensional. Each of the T test spectra is searched through the reduced library, using the comparison metric under consideration, to produce a long search list, which consists of each of the N library spectra, ordered by similarity to the test compound. Each of the T long search lists therefore contains all N members of the reference library. For each spectrum on the T by M array of standard search lists, the list index position for the same spectrum on the corresponding long search list is found. This results in a T by M list index position array. In the following discussion, this matrix of list index positions is called L , with elements li,. If the compressed spectral representation completely maintains the information present in the full spectra, then the first M members of the long search list would be exactly the same as the standard search list for each of the T test spectra. In this case, the list index positions will be in sequence from 1 up to M. In any case, each top row list index position array element should be 1for a test compound drawn from the library since this should match perfectly to itself. Each element of the T by M list index position array is summed to yield the raw index position score, S

(3) The smaller S is, the more closely the compressed spectra parallel the LS performance of the full spectra. Figure of Merit. To allow the results of this evaluation to be compared on an absolute quantitative basis, the raw index position score, S, is converted to a figure of merit (FOM), expressing S in relation to the best and worst possible index position scores, B and W , respectively. The best score is

achieved when each of the T columns of L is in sequence from 1 to M M

B = T*Ci = T.(M i=l

+ 1)*M/2

(4)

The worst possible score occurs when each column of L has as its M index positions the largest possible entries, namely, N, N - 1, N - 2, etc. In the particular case when the test compound is drawn exactly from the reference library, it will, of course, match exactly with itself. In this instance the worst possible score corresponds to the sequence 1,N , N - 1,N 2, etc., which results in an overall worst score of

which can be written as

w = T.[l + ( M - 1)(N - ( M - 2)/2)]

(6)

when the summation is evaluated. The figure of merit is defined so that a value of 0% is obtained for the worst case and 100% for the best

FOM = 100*[1- (S - B ) / ( W - B ) ] % = loo.[(w-S)/(W-B)]%

(7)

Example. A specific hypothetical example will help to clarify this evaluation process. Consider a library of 15 compounds from which a test set of three compounds is selected. The compounds are numbered from 1 to 15, and the test compounds are numbers 5,10, and 15. Standard search lists of the five best matches will be used. In this example N = 15, T = 3, and M = 5. Table IA shows the standard search lists for the full resolution spectral library. As expected, since each test compound was drawn directly from the library, it is found at the very top of the standard search list. Table IB shows the long hit-lists forr the library of reduced spectra. In both Tables IA and IB each column corresponds to one of the test compounds. For Table IB there are 15 entries in each column, one for each library member. The best matching spectrum is in row 1; the second best matching spectrum is in row 2, etc. Table IC shows the corresponding list index positions. These are found by locating each compound of the standard search list in the corresponding long search lists (Table IB) and noting the list position. For example, on the standard search list (Table IA) test compound one’s fourth best match is compound number 8. On the long search list (Table IB) for test compound one, compound number 8 is found to be in list position 14. Therefore, in the list index position array (Table IC) for test compound one, there is a 14 in column one, row 4. The rest of this list index position array was completed in an analogous fashion. The list index position array is now used to generate a figure of merit (FOM) which can be used to quantitatively compare the results of this (hypothetical) LS system to the results of other systems. First, column sums are obtained for each of the columns in Table IC. These sums are then added to give S, the raw index position score (eq 3). The FOM is calculated according to eq 7, using the best and worst possible sums (eq 4 and 6). The worst possible score for this example would occur when the sequence 1, 15,14,13,12is in each of columns 1-3. The first entry in each column must be a 1since each test set member is actually in the reference library. The other four entries indicate that the compounds found high on the standard search list are simultaneously found at the very bottom of the long search lists, which corresponds to the worst possible library searching performance. According to this procedure values of S = 73, B = 45, and W = 165 and an FOM of 76.7% are obtained.

ANALYTICAL CHEMISTRY, VOL. 55, NO. 12, OCTOBER 1983

Table 11. An Example of the Use of Summed Weighted List Positions to Account for Hit-List Interchanges

Table I. An Example of the Summed List Position Library Searching Evaluation search list position 1 2 3 4 5

List Index Position Array

A. Standard Search Lists test compound number

__

index wt position factor

1

2

3

5 2 12

10 7 1

15 13 12

1 2

8

9

3

9

5

1

4 5

3

B. Long Search Lists

1

2

3

1 2

5 2

10

15

I

3

12

4

4 3

1 9 5

6 12

6

7

4

7 8

10 15 13 6

11 13 15

9 10 11 12

8

15

8 14 2 7 5 4

2 12 14 3

14 11

13 14

1

9 3 11

C. List Index Position Array

index position 1 2

3

4 5

column sum raw score FOM (%)

test compound number 1

2

3

1 2 3 14 11

1 2 4 5

1 4 3 14 5

15

27

3

31 73

76.7

Weighted List Position Scores. A t this point the LS performance evaluation technique described cannot distinguish among different sequences of list index positions. For example, for a list depth of five the ideal list index position sequence 1, 2, 3, 4, 5 wlould receive the same FOM as the certainly less desirable sequence 5, 4, 3, 2, 1. The LS performance evaluation approach can be further extended by using weiglhted list position scores. The most obvious use of such weighting would be to quantitatively account for interchanges in the list position array. The list position summation score relationship (eq 3) is modified to include the cont,ribution of the weights T M

s = ic= l jc Wj‘lij =1

1 2 3 4 5

2

1 3 2 4 5

3

1 2 3 5 4

15 5.17

15 5.05

98.79

99.64

S and hence a poor FOM. An example of interchange cornpensation is shown in Table 11. Note that FOMs can be calculated for each test compound (column) by setting T .= 1in the previous equations. The function used was the inverale of the list depth. That is, each weight wI = l/j. It is clearly seen that a list position sequence 1,2, 3,4, 5 is perfect, while 1,3, 2, 4, 5 is inferior to 1, 2, 3, 5, 4. This is consistent with the intuition that interchanges lower down on a hit-list are less significant.

13 1 10

6 8

9

1.00 0.50 0.33 0.25 0.20

test compound number 1

column sum 15 weighted column sum 5.00 column FOMs (%) 100.00

test compound number

search list position

5

i92;r

(8)

where wj is the element of the weight vector corresponding to a given list depth in the list position array L. The expressions for the best and worst possible scores (eq 5 and 6) are suitably modified to account for the weights. The FOM formula (eq 7) is unchanged. To account for interchanges within the top M matches, the weights would be selected to attribute more importance to the top of the hit-list positions. Since higher FOMs are obtained when the score S is small, the weights should be skewed so that the product of wj and lid will be minimized when the list index positions are in the desired sequence. Any deviation from the desired sequence should lead to increased values of

EXPERIMENTAL SECTION Library Reference Spectra. Absorbance vapor phase infrared spectra covering a wide range of compound types weire used. Each raw spectrum consisted of 1842 data points, sampled at 2 cm-l frlom 4000 cm-l to 450 cm-l. Each of the 2000 library spectra was reduced to a 231-dimensional spectrum by a combination of moving and boxcar averaging (14. The intensities were normalized to be between 1 and 1000 (at unit resolution. Two binary intengity representations of vapor phase infrared (VPIR) spectra were used in this study. The widthenhanced spectra are described in more detail elsewhere (14). The clipped Fourier transform representation is based on the paper by Lam, Foulk, and Isenhour (15). Width-Enhanced VPIR Spectra. The width enhanced representation (14) was derived by using a “peak-picking’’ algorithm to locate dl discernible peaks with intensities greater than 2% of the spectral maximum. For each peak the base width was determined. A given width-enhanced representation was formed by encoding as “1”a given amount of the p e c k width. Our 100% library contains no peak width (only the peak center is encoded), while the 50% library encodes half of the peak width. Clipped FTIR Slpectra (15). A library of spectra in the clipped FTIR representation was formed by using the same 2000 spectra as for the width enhanced study. The 231-dimensional representation was zero-filled to 512 channels. This spectrum was mirrored into 1024 data points (16) and Fourier transformed by using the IBM Scientific Subroutines Package RHARM procedure. The clipped FTIR spectrum was formed by encoding channels with a positive Fourier coefficient as a “l”,and a negative coefficient as a “0”. Computational Details. For both types of representations the 231-dimensional binary intensity spectra were bit packed into 32-bit words and stored in a random access file. ,411 computations were conducted with FORTRAN programs on either an IBM-370/168, an IBM-3081, or a VAX-11/730 system. A test set of 15 compounds that span the range of functioinal group types present in the library was selected. This is similar to an approach employed earlier (14, 17). The functioinal groups represented in the test set included acid, ester, aldehyde, ketone, alcohol, phenol, amine, aromatic, olefin, and

1928

ANALYTICAL CHEMISTRY, VOL. 55, NO. 12, OCTOBER 1983

/ \

85

81

410

55

20

30

40

SO

80

70

80

90

100

50

0

WIDTH

Figure 1. Figure of merit vs. amount of width enhancement.

halide functionalities. Binary intensity spectra were compared by a variety of Boolean based functions as shown in the following section.

RESULTS AND DISCUSSION There are three basic kinds of studies for which this approach can be used to develop a successful LS system: (1) optimization of an adjustable spectral representation parameter, (2) comparison of alternative spectral representations, and (3) evaluation of alternative spectral comparison metrics. An example of each of these three capabilities is presented below. 1. Optimization of the Degree of Width Enhancement. For relatively broad and variable width spectral peaks, such as those found in VPIR spectra, width enhancement can dramatically increase the information content for a binary intensity library over that which would be obtained from simple thresholding, as is used in mass spectrometry (13). However, too much width enhancement can cause a loss of spectral wavelength resolution. Figure 1 presents a plot of the FOM vs. the amount of width enhancement. A clear maximum is observed for the 70% library, indicating a significant improvement in search performance by including this amount of width information. This result is in accord with several other approaches used to assess LS performance for this type of representation (14). 2. Comparison of the Width-Enhanced and Clipped Fourier transform representations. The Fourier transform of a typical chemical spectrum has most of the wave form information packed into the initial Fourier coefficients. This is caused by the wave form being relatively smooth and slowly varying, giving rise to low-frequency components. Lam and co-workers (15) reported acceptable LS performance for clipped FT mass spectra when the transformed spectrum was truncated to a relatively small number of Fourier coefficients. This behavior can be dramatically demonstrated by plotting the FOM vs. the number of low-frequency Fourier coefficients retained, as seen in Figure 2. It is observed that from the original 231 spectral wavelength channels, only about 50 Fourier coefficients are needed to maintain nearly optimal performance. The FOM allows a quantitative comparison of the two spectral representations: width enhanced and clipped FT. The optimal clipped FT library performance was obtained when 128 Fourier coefficients were retained, resulting in an FOM of 94.3 %, while the performance observed when more coefficients were used is not significantly inferior to the optimal. The confidence limit of the FOM may be estimated for the clipped FT library using three values chosen from the range in which the FOM is approximately constant. The performance for the 70% width enhanced library is found to

100

150

250

ZOO

FOURIER COEFFICIENTS

Flgure 2. Figure of merit vs. the number of clipped Fourier transform coefficients retained.

Table 111. Comparison of Various Binary-Intensity Dissimilarity Metrics by Using the 70% Width-Enhanced Library metric

summed list positions

XOR/IOR 1-AND/IOR XOR-BAND XOR d-AND NAND

11781 11781 13434 26423 32731 174529

figure of weighted merit (%) FOM (%)

95.9 95.9 95.3 90.5 88.2 35.5

96.9 96.9 96.3 92.5 89.4

-

be statistically superior at the 95% confidence level. This brief analysis demonstrates the advantages of using a quantitative measure of LS performance. 3. Evaluation of Several Boolean Based Comparison Metrics. Matching of binary intensity spectra has been performed by a variety of Boolean based similarity and dissimilarity metrics. The AND function was used initially (18) since this measures similarity and is the most obvious choice. The exclusive OR function (XOR) measures dissimilarity, and more importantly XOR is the binary analog of least squares, which lends a certain theoretical justification to its use (13, 19). Combinations of Boolean functions have also been used for library searching. Tanimoto (22) reported a normalized similarity metric, AND/IOR, where IOR is the normal (inclusive) OR function. We have reported (14) a normalized dissimilarity metric, XOR/IOR. Lam and co-workers (15) used the complement of the XOR as a similarity metric, calling it EXNOR, exclusive NOR. Grotch (13, 20) used a linear combination of XOR and AND, XOR-KAND, where p was intuitively set equal to 2. In previous reports, the introduction and testing of one or two metrics have been the subject of the entire study (8). Evaluation of each new metric was generally based on inspection of hit-lists and was therefore subjective. By use of the proposed evaluation process, it is possible to compare many different metrics rapidly. More importantly, an objective decision regarding the best metric for LS can be made by a consideration of the FOM values. Reported in Table 111 are the results of a comparison of six metrics which were used to search a library of width-enhanced VPIR spectra. It should be noted that each metric was used in the dissimilarity mode. For example, (&AND) was used to convert AND to a measure of dissimilarity, where d is the number of wavelength channels per spectrum. Several conclusions can be drawn from these results. The normalized XOR metric (XOR/IOR) is seen to give the same searching performance as the complement of the Tanimoto metric (1-

Anal. Chem. 1983, 55, 1929-1933

AND/IOR). These two functions exhibit the best perfonnance based on the FOM. In order of decreasing performances based on FOM the next best metrics are Grotch, XOR, and then AND. For comparison the NAND metric was used. This is neither a similarity nor a dissimilarity metric, and its complete lack of utility is reflected in the exceedingly poor FOM. Weighted List Positions Results. The final column of Table I11 contains FOMs for the various binary metrics using l / j weighting of the list index positions. It is seen that the results present the metrics in exactly the same order as for unweighted evaluation. The weighted FOMs are slightly higher than the unweighted values. This is caused by the gradual decrease of FOM with hit-list length. In the weighted case the list index positions at the top of the hit-list are given more impact on the FOM, which makes the resulting FOM higher than that, for an unweighted evaluation. A later publication will consider optimal weighting of list index positions in more detail.

COIVCLUSION A quantitative method for the evaluation of library search systems containing any combinationof spectral representation and comparison metric hai been presented. While a particular standard of comparison has been used, any desired standard could, in fact, be chosen. It has been demonstrated that this approach can be used to optimize spectral representations or to select comparison metrics. The proposed evaluation process will facilitate the design of LS systems in an objective and quantitative manner and should therefore provide an important step away from the development of such systems by trial and error. Because of the flexibility of this evaluation tool, it will serve well as one of the many techniques which are needed to study the utility of combined spectral information (e.g., mass spectrometry/FTIR) for structure elucidation of chromatographically separated components.

1921)

ACKNOWLEDGMENT The authors wish to thank Jules Abadi for developing the program used to generate the information in Table I. LITERATURE CITED (1) Grotch. S. L. Anal. &em. 1973, 45, 2. (2) Wangen, L. E.; Woodward, W. S.; Isenhour, T. L. Anal. Chem. 1971, 43. - , 1605. . - - -. (3) Woodruff, H. B.;Lowry, S. R.; Ritter, G. L.; Isenhour, T. L. Anal. Chem. 1975, 4 7 , 2027. (4) Penskl, E. C.; Padowskl, D. A.; Bouck, J. B. Anal. Chem. 1974, 4 6 , 955. (5) Kwiatkowskl, J.; Riepe, W. Anal. Chim. Acfa 1979, 112, 219. (6) Rasmussen, G. T.; Isenhour, T. L. J . Chem. I n f . Comput. Sci. 197% 19, 179. (7) Fox, R. C. Anal. Chom. 1976, 48, 717. (8) Heller, S. R.; Koniver, D. A.; Fales, H. M.; Milne, G. W. A. Anal. Chernl. 1974, 46, 947. (9) Zupan, J.; Heller. S. 13.; Milne, G. W. A.; Miller, J. A. Anal. Chlm. Acta 1978, 103, 141. (10) Erley, D. S. Appl. Spectrosc. 1971, 2 5 , 200. (11) Lowry, S. R.; Hyppler, D. A. Anal. Chem. 1981, 53, 889. (12) Kwiatkowskl, J.; Riepe, W. Fresenius' Z . Anal. Chem. 1980, 302, 300. (13) Grotch, S. L. Anal. (?hem. 1870, 42, 1214. (14) Warren, F. V.; Delaniey, M. F. Appl. Spectrosc. 1983, 3 7 , 172. (15) Lam, R. B.; Foulk, S. J.; Isenhour, T. L. Anal. Chem. 1981, 53, 1670. (16) Lam, R. B.; Wleboidl, R. C.; Isenhour, T. L. Anal. Chem. 1981, 5 3 , 889A. (17) Delaney, M. F.; Uden, P. C. Anal. Chem. 1979, 5 1 , 1242. (18) Grotch, S. L. Anal. Chem. 1971, 4 3 , 1362. (19) Grotch, S. L. Anal. Chem. 1974, 4 6 , 526. (20) Grotch, S. L. Anal. Chem. 1975, 4 7 , 1285. (21) Woodruff, H. B.; Lowry, S. R.; Ritter, G. L.; Isenhour, T. L. Anal. Chem. 1975, 4 7 , 2027. (22) Rogers, D. J.; Tanimoto. T. T. Science 1960, 132, 1115.

RECEIVED for review ]December 28,1982. Accepted June l:', 1983. Acknowledgment is gratefully made to the donors of the Petroleum Research Fund, administered by the American Chemical Society, and the National Science Foundation'b Information Science and Chemistry Division (Grant No. IST-8120255) for the financial support of this research.

Enzymatic Determination of Thiamine Pyrophosphate with a pC0, Membrane Electrode Purneshwar Seegopaul and Garry A. Rechnitz* Department of Chemistry, Uniuersity of Delaware, Newark, Delaware 19711

Thiamine pyrophosphate (ThPP) is determined by potentiometrically measuring the initial rate of carbon dioxide formation from a reactlon sequence involving the recombination of ThPP with pyruvate decarboxylase apoenzyme to the hoioenzyme. The proposed method is hlghiy selective and permtts determination of less than 1 ng mL-' ThPP without any separation procedures or secondary reactions. I n tests of synthetic laboratory samples, the method shows good agreement wlth an enzyme-coupled spectrophotometric procedure.

Thiamine (vitamin BJ exists in blood and tissues both in the free form and as phosphate esters. Thiamine pyrophosphate (diphosphate) or cocarboxylase, ThPP, is the metabolically active coenzyme form of thiamine in a large number of enzymes catalyzing acyl group transfer reactions, for example, decarboxylation of a-keto acids and the formation of a-hydroxy carbonyl linkages (1, 2). In blood, thiamine

pyrophosphate is found mostly in the erythrocytes and, to a lesser extent, in the plasma (3). Clinically, significant reduction in dietary thiamine results in the classical deprivation syndrome, beriberi. Both enzymatic and nonenzymatic methods have been developed to determine thiamine and its phosphate esters. Oxidation of thiamine and its phosphate esters to fluorescent thiochrome derivatives forms the basis for the most widely used chemical assay procedures (4).Separation techniques coupled with fluorometric detection have been proposeld. These techniques include electrophoresis (5),paper chromtatography (6), column chromatography (7), and high-perfornnance liquid chromatography (8,9). Determination of ThPP has also been possibde with a direct current polarographic method (10). Most of these methods suffer from matrix iinterferences, time-consuming procedures, and expensive iinstrumentation. Enzymatic assays for ThPP involve the use of either the transketolase or pyruvate decarboxylase apoenzymes. Re-

0003-2700/83/0355-1929$01.50/0@ 1983 American Chemical Soclety