Statistical theory of spot overlap for n-dimensional ... - ACS Publications

Anal. Chem. 1993, 85, 2014-2023

2014

Statistical Theory of Spot Overlap for *Dimensional Separations Joe M. Davis Department of Chemistry and Biochemistry, Southern Illinois University at Carbondale, Carbondale, Illinois 62901

A theory by Roach describing the overlapof circles in a two-dimensional plane is generalized to an n-dimensional space, where n is the number of orthogonalaxes. The theory is proposed as a model of overlap in an n-dimensional separation. Extensive computersimulationsshow that the theory describes overlap very well in three-dimensional spaces and modestly well in four-dimensional spaces, when the number of components is large. Perturbations from edge effects somewhat limit the theory's application to four-dimensionalspaces. The theory shows that the maximum number of spots per unit capacity and the maximum number of any kind of multiplet per unit capacity bothdecreasegeometricallywithincreasingn. The theory also shows that the maximum fraction of any kind of multiplet is independent of n. Equations are derived to relate the saturation of an n-dimensional space to the one-dimensional saturation of any axis and the one-dimensional peak capacities of the other n - 1 axes. Other equations relate the saturation of an n-dimensional separation to the saturation of an ( n + 1)-dimensional separation and the one-dimensionalpeak capacity of the ( n + 1)th dimension. With these last equations, the internal consistency of the theory is justified. A critique finally is presented, which explains why the theory does not apply to a single dimension of separation.

INTRODUCTION Recent work from this laboratory has addressed by theoretical means the statistical limitations of two-dimensional (2-D) separations.13 By statistical limitations, one means the constraints imposed on the separation of mixture components by the random positioning of these components in a 2-D bed. Separations of this type include classical ones, such as 2-D thin-layer chromatography and isoelectric focusing/electrophoresis, as well as the recent comprehensive 2-D separations of Jorgensonq-6and Phillips.7~8This work is the logical extension of earlier studies"14 of the statistical limitations of one-dimensional (1-D) separations, which are quite pronounced. (1) Davis, J. M. Anal. Chem. 1991,63, 2141. (2) O r a , F. J.; Davis, J. M. J. Chromatogr. 1992,591, 1. (3) Shi, W.; Davis, J. M. Anal. Chem. 1993,65,482. (4) Bushey, M. M.; Jorgenson, J. W. Anal. Chem. 1990,62, 161. ( 5 ) Bushey, M. M.; Jorgenson, J. W. Anal. Chem. 1990,62,978. (6) Bushey, M. M.; Jorgenson, J. W. J.Microcolumn Sep. 1990,2,293. (7) Liu, Z.; Phillips, J. J. Chromatogr. Sci. 1991,29, 227. ( 8 ) Phillips, J. B.; Liu, 2.;Venkatramani, C. J.; Jain, V. 13th Znternutional Symposium on Capillary Chromatography; Sandra, P., Ed.; Huething Verlag: Heidelberg, 1991; Vol. 1. (9) Rosenthal, D. Anal. Chem. 1982,54,63. (10) Nagels, L. J.; Cretan, W. L.: Vanpeperstraete, P. M. Anal. Chem. 1983,55, 216. (11) Davis, J. M.; Giddings, J. C. Anal. Chem. 1983,55, 418. 0003-2700/93/0365-2014$04.00/0

Of particular interest is a theory developed by Roach some 25 years ago. This theory describes the number of isolated clusters formed by circles distributed randomly in a 2-D plane.16 The clusters either are single circles free from overlap or groups of overlapping circles. Oros and Davis extensively tested this theory by computer simulation and showed that it held promise for describing overlap in 2-D separations.2 More recently, Shi and Davis applied the theory to the determination of the number of maxima in computer simulations of 2-D beds containing realistic concentration profiles.3 These efforts indicate that 2-D separations, althoughvastly superior to 1-D ones, still are subject to modest amounts of overlap. Therefore, from a theoretical perspective, it is reasonable to consider the efficiency of separations utilizing three or more dimensions. From an experimentalperspective, however, an interest in separations utilizing three or more dimensions appears to be very small. For example, the author is aware of only one actual three-dimensional (3-D) separation,16 althoughcomputer-calculated3-D "Separations" based on the interpretation of three separate 1-D separations by a strange attractor algorithmhave been described." This lack of interest undoubtedly results from many practical attributes, including the complexity of such efforts,but also from a dearth of theory. To the author's knowledge, no study has estimated the extent of separation expected in separation spaceshaving more than two dimensions. Such a study ie important, because it provides a basis for judging whether the gain in separation is worth the effort of designing a system to implement it. This paper reports preliminary findings from a theoretical study of efficiency in multidimensional separations. One first must clarify what is meant here by multidimensional separation, because this terminology is used somewhat loosely. A number of studies and reviews have been published in which two-dimensional or multidimensional separations based on coupled columns are described.'&% Others also have implemented separations with coupled columns, without classifying them as multidimensional.27 In addition, a pseudomultidimensional separation based on sequential (12) Martin, M.; Herman, D. P.; Guiochon, G. Anal. Chem. 1986,58, 2200. (13) Felinger, A.; Pasti, L.; Dondi, F. Anal. Chem. 1990,62, 1846. (14) Felinger, A.; Pasti, L.; Dondi, F. Anal. Chem. 1991,63, 2627. (15) Roach, S. A. The Theory of Random Clumping: Methuen and Co., Ltd.: London, 1968. (16) Giddings, J. C. In Multidimensional Chromatography: Techniques and Applications; Cortes,H. J., Ed.: Chromatographic Science Series 50; Marcel Dekker: New York, 1990; p 1. (17) Zeineh, M. M.; Zeineh, J. A.; Zeineh, R. A. Am. Lab. 1993,25,44. (18) Majors, R. E. J. Chromatogr. Sci. 1980,18, 671. (19) Alfredeon, T. V. J. Chromatogr. 1981,218, 715. (20) Apffel, J. A.; McNair, H. J. Chromatogr. 1983,279, 139. (21) Duinker, J. C.; Schulz, D. E.; Petrick, G. Anal. Chem. 1988,60, 478. (22) Roston, D. A.; Wijayaratne, R. Anal. Chem. 1988,60, 948. (23) Rodriguez, P. A.; Eddy, C. L.; Marcott, C.; Fey, M. L.; Anast, J. M. J.Mzcrocolumn Sep. 1991, 3, 289. (24) Johnson, E. L.; Gloor, R.; Majors, R. E. J.Chromatogr. 1978,149, 571. (25) Grob, K., Jr.; Frahlich, D.; Schilling, B.; Neukom, H. P.; NHgeli, P. J. Chromatogr. 1984,295, 55.

0 1993 American Chemical Society

ANALYTICAL CHEMISTRY, VOL. 65, NO. 15, AUGUST 1, 1993

multimodal elution has been described.% According to Giddings, however, components in a multidimensional separation are subject to a seriea of largely independent separative displacements, and the separation is structured, such that whenever two Components are separated by any one displacement, they remain separated throughout subsequent displacements.18*%Within this framework, none of the above separations truly is multidimensional,although they are both powerful and useful. In the discussion below, multidimensional separations are envisioned as being consistent with the definition of Giddings. The adoption of this definition is not judgmental and is for clarity only. One can envision at least two generalways that separations can be implemented in three or more dimensions. The first way entails the collection by some means of small eluant fractions, the volume of each roughly equaling that of a singlecomponent peak or less, from any dimension of separation, and the further separation of some or all of these fractions in one or more subsequent dimensions. In such separations, each dimension is associated with a separative column, and every component elutes from each column at a characteristic time. These times can be interpreted as coordinates, which distinguish each component in the separation. An example of this genre of separation is the 3-D separation noted above, for which analyte fractions were collected and reinjected manually. Another example is the comprehensive separation of Jorgenson and Phillips, as implementedwith three or more columns, instead of two. In principal, the dimensionality of such separations is unlimited. A second way of implementing 3-D separations is more hypothetical and would entail the sequential displacement of components along three orthogonal directions in a 3-D matrix, such as a porous gel-like cube. In one mode of operation, components first would be displaced in two orthogonal directions, without eluting from the matrix, and then would be eluted by the third orthogonal displacement. Each component could be distinguished uniquely by two spatial coordinatesspecifying position in the first two matrix dimensions and a temporal coordinatespecifyingelution time in the third dimension. In another mode of operation, components could be displaced in three orthogonaldirections without elution, if means were devised to detect them within the matrix. In this case, components could be distinguished by three spatial coordinates. The maximum dimensionality of such separations is three. The theme of this work is that the efficiency of such separations can be described by a statistical interpretation of the gaps between these coordinates. More specifically, the work developed below is a generalization of the theory of Roach, who apparently did not realize (or perhaps care) that his theory for the overlap of circles in a 2-D plane could be generalized to higher dimensions. With this generalization, the gaps between coordinates in an n-dimensionalspace can be interpreted. The resultant theory permits one to estimate the numbers of clusters expected, when the components producing these clusters are distributed randomly in this space. It also permits one to estimate the numbersof different cluster types, e.g., singlets, doublets, triplets, etc. From these estimations, the efficiency of separation can be gauged. In this theory, one assumes that components are subject ton independent separative displacements. The implications

2015

of this assumption are 2-fold. First, the assumption implies that components that have been discarded or eliminated in previous dimensions of separation cannot be addressed by theory, because they are not associated with n coordinates. The second implication is that displacements in one dimension have no correlation with displacements in other dimensions. This assumption is reasonably good in a comprehensive 2-D separation implemented with reversed-phase LC and capelectrophoresi~.~ It may be increasingly difficult to satisfy, however, as the number of dimensions increases. The validity or refutability of this assumption in specific cases, however, is not really relevant to this study. Rather, the study is proposed as a guideline to what one could expect to achieve, on average, in an n-dimensional space. The final assumption is that, after n sequential displacements, the components are randomly distributed in an n-dimensionalspace. In other words, the average number of components contained within any volume of space is proportional only to the volume. If components are not discarded or eliminated from any dimension, this condition must be satisfied in all lower dimemions of space. The condition does not have to be satisfied in lower dimensions of space, however, if components are removed selectively from the lower dimensions;it simply must be satisfied by those components surviving n displacements. Because various dimensions can have different extents and even different physical units (e.g., one dimension may be spatial and another temporal), in practice the assumption means that all dimensions can be scaled appropriately, such that this conditionis fulfilled. This is the principal shortcomingof the theory. Ideally, one would prefer a theory in which the component density of each dimension could be varied independentlyof other dimensions. Until such a theory can be developed, however, the following may be the best assessment of overlap that is available.

THEORY The theory is conceptually outlined here. One postulates that components in an n-dimensional space are randomly distributed throughoutthat space and that their concentration profiles can be represented by points located at their centera of gravity. The probability is calculated that the interval between any point and its nearest neighbor is large enough for the profiles they represent to be resolved. The probability that they are not resolved is evaluated as the complement of the probability that they are resolved. From these two probabilities, the expected numbers of singlet and multiplet clusters are calculated, and the summed number of these clusters is interpreted as the expected number of n-dimensional spots. This number then is expressed simply in terms of the expected number of components in the n-dimensional space, the volume of that space, and the volume, centered about each point, from which neighbors must be excluded to avoid overlap. This simple expression is achieved by defining the saturation of the separation. The expected number of spots then is shown to depend on the expeded number of Components, the saturation, and the dimension of the space. Consider the random distribution of m components throughout an n-dimensional space, where n is a positive integer greater than 1,e.g., 2,3, etc. The detectable fractions of these components are represented by n-dimensional spheres of diameter d,, where d, is measured in units of time or

2016


at the spheres' centers. One desires to calculate here the total number of n-dimensional spots or, to simplify the language, spots. Spots either can be singlets,which are spheres that do not overlap with other spheres, or multiplets, which are clusters of spheres in which each sphere overlaps with at least one other sphere. By generalizing the treatment of Giddings and Keller,30 one can deduce that component profiles in an n-dimensional space are described more correctly by the product of n orthogonal Gaussians than by an n-dimensional sphere. Such profiles actually are associated with not n but n + 1 coordinates, where the (n + 11th coordinate is intensity or concentration. A profile in a 3-D space could be visualized as an ellipsoid, if various contours of concentration were depicted, but profiies in higher dimensions could be visualized only as projections in lower dimensions. The assumption that these profiles can be represented here by n-dimensional spheres is not verified. Rather, one simply notes that similar geometric simplifications have been successful in modeling overlap in 1-D and 2-D separations. Experiment31732 and computer simulations33-SBshow that 1-D profiles can be representedaccurately by line segments of fixed length, under appropriate conditions. Similarly, computer simulations3 show that 2-D profiies can be represented accurately by circles of fixed diameter, under appropriate conditions. Induction leads one to postulate here that componentsin n-dimensional spaces can be represented by n-dimensional spheres, under appropriate conditions. One further postulates that any two spheres will overlap, if the points representing their centers lie within a characteristic span Bd, of each other. Here, as elsewhere,' B is a scaling parameter that permits one to adjust the criterion by which overlap is defined, relative to the sphere diameter. To determine the likelihood of overlap, one must calculate the probability that any two spheres lie within span Bd, of each other. Let the sphere that lies closest to any sphere in question be designated its nearest neighbor. For spheres distributed randomly in an n-dimensionalspace, the probability density fn(r)governing the span r between a sphere and its nearest neighbor (or, more strictly, the span between the points representing their centers) is37 where

is calculated from eq l a as p(r>@d,) = wng&r*' exp(-wr") dr = exp(-w(Bd,)")

In contrast, nearest neighbors will overlap, if the span between them is less than Bd,. The probability that they overlap is the complement of eq 2a p(rlBd,) = w n r P 1 exp(-or") dr = 1- exp(-w(BdJ? (2b)

By a reasoning identical to that outlined by Roach16for a 2-D space and summarized by Oros and DaVis,2one can deduce that the expected number of spots containing Y components in an n-dimensional space is

P,,,= m exp(-w(Bd,)"){l - exp(-w(Bd,)n))"l/v

(30) Giddinga, J. C.; Keller, R. A. J. Chromatogr. 1959,2, 626. (31) Davis, J. M. J. Chromatogr. 1988,449, 41. (32) Delinger, S. L.; Davis, J. M. A m l . Chem. 1990,62, 436. (33) Herman, D. P.; Gonnard, M. F.; Guiochon, G. Anal. Chem. 1984, 56, 995. (34) Davia, J. M.; Giddings, J. C. J. Chromatogr. 1984, 289, 277. (35) Davis, J. M.; Giddings, J. C. Anal. Chem. 1985,57, 2178. (36) Dondi, F.; Kahie,Y. D.; Lodi, G.; Remelli, M.; Reschiglian, P.; Bighi, C. Anal. Chim. Acta 1986,191, 261. (37) Vanmarcke, E. Random Fields: Analysis and Synthesis; MIT Press: Cambridge, MA, 1983.

(3)

The expected total number p of spots in the n-dimensional space is the s u m of all Pn,v)S m

P = EP,,,= mw(Bd,)" v=l

exp(-w (@do) ") 1- exp(-w(bd,)")

(4)

Here, the author omits the subscript, n, on the variable, p, to simplifythe description. Similar choiceswill be made below for other commonly used variables. When necessary, subscripts will be used to avoid confusion. Although eq 4 is an acceptable result for the expected number of spots, it is desirable to express it in a form which depends on the saturation of the n-dimensional space. To do so, one must first define the spot capacity of this space. The content or volume u, of a sphere of diameter Bd, is3783S

where the final identity is obtained by substitution of eq l b into the central identity. With eq 5, one can express the term w(8do)"that appears in eqs 3 and 4 as w(fld,)" = 2"mu,/ V,

(6)

If one now defines the spot capacity n, of the n-dimensional space as the ratio of the volume Vn of the space to the volume un of one sphere n, = V,/v, and the saturation a of this space as

In the above equations, X = a/V, is the density of components in a space of volume (or content) V,, f i is a statistical approximation to the number m of components,and I' is the gamma function. Any two nearest neighbors will be resolved, if the span r between them is greater than the effective diameter @doof their spheres (this statement is justified elsewhere for a 2-D space'). The probability that nearest neighbors are resolved

(2a)

a = m/n,

(7) (8)

then one can express eq 6 as u(fld,)n = 2,a

(9a)

andaas

where eq l b has been used in the final identity of eq 9b. The expected number P,,,of spots containingY components and expected total number p of spots, eqs 3 and 4, now can be expressed with eq 9a as

(38) Sommerville, D. M. Y. An Introduction to the Geometry of n Dimensions; E. P. Dutton and Co., Inc.: New York, 1938.


2017

1,

p/% 0.8

0.6 0.4 0.2 O!

Oll

012

0.1

0.2

a

013

014

015

0

0.1

0.2

0.3

0.4

0.5

0

0.1

0.2

a

0.3

0.4

0.5

0.3

0.4

0.5

0.12 0.08

0.04

0

0

a

a

Flgure 1. Qraphs of p l h slm, dim, and t/m vs a for a 3-D separation (n = 3). The curve in (a) is a graph of eq 11; the curves in (b-d) are graphs of eq 10, wlth u = 1,2, and 3, respectively. Each symbol represents the average result of 100 simulations,and the error bars represent one standard deviation. Symbols for different numbers of components are klentlfied in the figure. r(1/2) = Ir1/2. Another useful identity is I'(k + 1)= k!,where e~p(-2~a) p = 2"am (11) ! represents the factorial function. The argument k in these 1- exp(-2"a) identities need not be integral, although the last identity is which are the desired results. rarely usdd for nonintegral k. For n = 2, eq 11reduces to the result derived by Roach15 In each simulation, values of j3d, were calculated from eq and used in this laboratory28 to model 2-D separations. 14 for various input parameters a and m. The latter parameter was approximated by the number, m,of components in the PROCEDURES simulation. If the span djj was less than Bd,, then spheres i and j were assumed to overlap. The total number of Equations 10 and 11 were tested for n = 3 and n = 4 by overlapping clusters, and the numbers of singlet, doublet, computer simulation. An algorithm previously written to and triplet clusters, were so determined. determine the numbers of spots in a 2-D space2was modified to accountfor these higher dimensions. In this new algorithm, All computer programs were written in Language Systems the centers of m spheres were distributed randomly in three FORTRAN and executed on a Outbound Macintosh with a and four dimensionsby defining the positions of each center 25-MHz 68030 CPU and 68882 math coprocessor. The with three and four random numbers, respectively. random-number generator used was a library function of this The span djj between the ith and jth center was calculated FORTRAN program. as RESULTS AND DISCUSSION n (12) Agreement between Theory and Simulation. Figures 1and 2 are dimensionless graphs of the numbers of spots p, where Pjk represents the kth coordinate of center i. The m(m singlets s,doublets d , and triplets t ,divided by m, vs saturation - 1)/2 spans so computed were compared to a series of Bd, a. The results in Figure 1are for a 3-D separation (n = 3), values defined by eq 9b and those in Figure 2 are for a 4-D separation (n = 4). The curves in Figures l a and 2a are graphs of eq 11, whereas the curves in Figures l b and 2b, ICand 2c, and Id and 2d, are graphs of eq 10,with u = 1,2, and 3, respectively. The various symbols represent the average numbers, divided by m, of The volume Vn in eq 13 must be specified to calculate Od,. spots, singlets, doublets, and triplets found in 100 computer The space defined by these computer simulations is an simulations containinga fiied number m of components. This orthotope, whose volume equals the product of the lengths number was varied among 100,250,500,and 750 for different of its n orthogonal edges.38 Since the random numbers sets of simulations. The vertical error bars represent one defining the spheres' centers span the range 0-1, the length standard deviation, but all standard deviations are not shown of each edge of the orthotope equals 1. Hence, Vn = 1. to avoid cluttering the graphs. Simulations with larger m's For n = 3 and n = 4, eq 13 simplifies to would have been desirable but were not carried out, because of the computational time required. The principal factor Bd, = (6a/1rm)'/~ n = 3 that consumed computational time was the determination of

Bd, = (32ar/1r~riz)'~' n = 4

(14) Here, one has made use of the identities r ( k + 1)= k r ( k )and

P. A cursory examination of these graphs indicates that the proposed theory agrees fairly well with the simulations. The

2018


1 .o

pliii 0.8 0.6 0.4

0.2

0.00

0.05

0.10

a

0.15

0.20

0.25

0.00

0.05

0.10

0.15

0.20

0.25

OI

0.16

d/K

0.001 0.00

I

0.05

1

0.10

I

a

0.15

I

0.20

I

0.25

v1

0.00 0.00

I

I

I

I

0.05

0.10

0.15

0.20

a

I

0.25

Flgurr 2. As In Figure 1, but for a 4-D separatlon (n = 4).

agreement is best for large a ’ s and values of p l m , dlm, and tlm. The standard deviations in p and s are fairly small, whereas those in d and t are fairly large. This variation principally exists because only a few multiplets were formed. A closer inspection shows systematic errors, which are more pronounced when n = 4 than when n = 3. In particular, the numbersplm and especiallyslm are larger than predicted by theory, although the deviation decreases with increasing m. Furthermore, the maximum values of dlm and t l m are less than predicted by theory, especially for n = 4. One observes that simulations were carried out only for values of CY less than 0.50. This is not because a fundamental limitation exists on theory, but because overlap is so severe beyond this value that little reason exists to consider more saturated separations. Principal Origin of Discrepancy. These errors are principally due to the analogue of 2-D “edge effe~ts”,3~ in which the likelihood of overlap decreases near the edges of a 2-D space. This decrease occurs because components near the edges have no neighbors beyond the edges with which to overlap. The total number of spots, and particularly singlet spots, consequently increases. A similar decrease has been addressed in 1-D separations12but is less serious than that considered here. In an n-dimensional space, with n 1 3, one more properly should refer to faces, instead of edges, but one will continue to use the expression “edgeeffects”for simplicity. In effect, the edges are boundaries, which limit the space to a finite volume, whereas one assumes in theory that the space is unbounded. The perturbation increases with n because the number of faces, and therefore the likelihood that a component lies near a face, increases with n. For example, a 2-D orthotope has only four edges (the four lines defining a rectangle), whereas a 3-D orthotope has six faces (the six planes defining a parallelepiped). The faces need not be orthogonal for these trends to hold. One can verify that edge effects contribute significantly to the observed discrepancy by limiting one’scounting of singlets to a central core of the space, in which the environment of (39)Tuckwell, H. C .Elementary Applications of Probability Theory; Chapman and Halk London, 1988.

each component is nearly isotropic. In this verification, a 4-D space was simulated as detailed above. A central core in this 4-D space then was defined as the subspace bound by the coordinates, ( r ) and 1- ( r ) ,along each axis. Here, ( r ) is the average distance between nearest neighbors, which for an n-dimensional space is37 ( r ) = C r f , , ( r )dr = wnCP exp(-wS) dr =

where fn(r) has been expressed by eq la. For n = 4 and V, = 1 eq 15a simplifies to (r) = 0.608140/&1/4

(15b) where I’(514) = (1/4)! = 0.906 402 (the numerical value was determined from the factorial function on an HP 11C calculator). By constructing a central core within the coordinates given above, one ensures that the distance from any point in the central core to any face is greater than or equal to ( r ) ,the average distance between nearest neighbors. Therefore, components in this central core are as likely to have nearest neighbors out of the core as in it. In other words, the space surrounding components in the central core is highly isotropic, at least with respect to nearest neighbors. The average number of singlets in this core (or core singlets) then was determined from 100simulations for each of several different CY and m values. The average number of components in the core (or core components) also was determined for these a’s and m’s. The ratio of the average number of core singlets to the average number of core components then was interpreted as the probability PIof singlet formation, which theoretically should be equivalent to the ratio, slm, in the absence of edge effects. Figure 3 is a graph of P1 so determined vs saturation a.The curve is a graph of eq 10, with Y = 1, whereas the symbols correspond to m values of 250,500, and 750 (but not 100, for reasons explained below). The vertical error bars represent

ANALYTICAL CHEMISTRY, VOL. 65, NO. 15, AUGUST 1, i ~ s 3 2019

Substitution of this result into eq 10 gives 0.8

4-0 central core

0.6

n=250 m=500

0

(P"",

0.4 0.2

0 0.00

0.05

0.10

a

0.15

0.20

0.25

Flgure 3. Probability4 of singlet formation vs a for a 4 4 separation. The curve is a graph of eq 10 with v = 1. Each symbol represents the average resutt of 100 simulations,and the error bars represent one standard deviation. In each simulation, only core slnglets and core component numbers were counted. Symbols for different numbers of components are Mentlfled in the figure.

one standard deviation. Lest confusion ensue, one observes that these m's were the number of components in the entire 4-D space, not the central core; the average numbers of core components were 14.62 (for m = loo), 58.12 (for m = 250), 152.92 (for m = 500), and 260.95 (for m = 750). The number of core components for m = 100is so small that theory cannot provide an accurate estimate of PI; hence, the results are not graphed. For the other m's,one observes the extremely close agreement between simulation and theory, even for large a, when only the central core region is examined. This agreement lends credence to the hypothesis that edge effects are very influential in the 4-D space. One may inquire why doublets and triplets were not similarly analyzed. If the nearest neighbor to a core component were to lie outside the core but were to overlap the core component, then the resultant multiplet (more than one overlap might exist) would not reside entirely in the central core. Hence, one could not guarantee that the environment of all its components was isotropic. Because core doublets and triplets were not analyzed in this manner, one strictly cannot argue that edge effects are responsible for the small discrepancies between simulation and theory for these multiplets. In spite of these discrepancies, the estimates of p l m are quite good for n = 3 and n = 4, when is large. Since these dimensionalities of space are not likely to be used unless m is very large, one need not be very concerned by the errors observed for small a. Hence, the most important theoretical parameter-the number of spots-can be estimated accurately for both 3-D and 4-D separations. The accuracy of the individual multiplets varies with both n and the multiplet type. In general, one can conclude that the theory works well for n = 3 but only modestly well for n = 4. Additional results, which are not shown here, show that theory and simulation for s differ even more than shown here, when n = 5. Hence, edge effects may impose a practical upper limit on n, although this possibility is not established firmly by this study. Optimization of Pn,J&,p/ne,and PnJnO.One now shifts one's focus to the variation of the dimensionless expressions, Pn,Jm, pln,, and Pn,,lnCwith saturation. A careful examination of the theoretical ratios, df m and tf m, in Figures 1and 2 suggests that their maximum values vary with the multiplet type but not with the dimensionality of space. For example, the maximum value of d l m is0.125in both 3-D and 4-D spaces. In fact, a simple analysis shows that these fractions are completely independent of n, which is somewhat surprising. By dividing by 10 by m, differentiating the result with respect to a,and equating the derivative to zero, one determines that the maximum value of Pn,,/mis obtained when a = In vf2".

= (v - 1)"'/Vy+'

which depends on the number u of components in the multiplet but not the number n of dimensions of the space. Therefore, no motivation exists to select the dimension n of a separation space to reduce the maximum fraction of a particular type of multiplet; that maximum fraction is the same for all n. The maximum number of spots, m, always will be resolved when a = 0. However, the maximum number of spots resolved per unit of spot capacity, pln,, varies with both a and n. Dividing both sides of eq 11by n, and substituting the identity, a = mln,, one obtains 2"a2 exp(-a"a) (17) 1- exp(-2"a) The maximum value of this ratio is obtained by differentiating eq 17 with respect to a and equating the result to zero, which leads to the expression pln, =

1- exp(-2"a) = o Equation 18a is equivalent to

(18a)

1- x/2 - exp(-x) = 0; x = 2"a (18b) which has a numerical solution determined by the bisection method equal to x = 1.593 63 (18~) Since this solution is common for all n, an increase by 1 in the number of dimensions reduces the a at which pln, maximizes by a factor of 2. By substituting the expressions for x in eqs 18b and c into eq 17, one concludes that

me-' = = 0.6476112" (19) =1- e-' 2"(1 -e-') Thus,the maximum fraction of capacity utilized in resolving spots decreases geometrically with the number of dimensions. This finding agrees with previous observations that the spot capacity of 2-D separations is utilized less effectively than the peak capacity of 1-D separations.lV2 In a similar manner, the maximum fraction of each multiplet P,,, per unit capacity n, can be calculated. By dividing eq 10 by n, and substituting the identity a = mln,, one obtains @In,),

Pn,Jn,= a exp(-2"a)(l- exp(-2"a))"'/u X

= - exp(-x)(l- exp(-x))"/v 2"

(20a) (20b)

where x is defined by eq 18b. Equation 20b reaches a maximum when 1-e-'- x(1- ye-') = 0 (21) Because the solution to eq 21 depends on v but is independent of n, the maximum value of Pn,,ln,varies with v but always is inversely proportional to 2", as indicated by eq 20b. For v = 1,the solution can be determined analytically and is x = 1. By substituting this result into eq 20b, one obtains

(22) (Pn,lln,), = (sin,), = e-112n The solution to eq 21 for other values of v requires numerical methods. Table I reports the values of z determined by the bisection method and the maximum fractions of capacity used to resolve multiplets corresponding to these v's. These fractions were calculated from eq 20b. Consequently, the maximum values of the ratios pln, and PnJnC both decrease geometrically with the number of

2020


Table I. Solutions to eq 21 for Various Values of V. V

x = 2%

2 3 4 5

1.4466 1.7406 1.9626 2.1408

(Pn,Jn&&"

0.1302 0.0692 0.0438 0.0305

The last column is obtained by substituting x into eq 20b.

dimensions. Because of this decrease, the utilization by components of capacity n, becomes increasingly ineffective with increasing n, and for any a,the extent of overlap increases with increasing n. These findings somewhat temper one's expectation of n-dimensional spaces. Yet, this resolvingpower is still formidable, as some calculations below will show. Figure 4 is a graph of pln,, sln,, dln,, and tln, vs a for 3-D and 4-D spaces that illustrates the above behaviors. One notes that the a at which each function maximizes in a 4-D space is half of its 3-D counterpart. Furthermore, the maximum value of any function in a 4-D space is half of its 3-D counterpart. Expression of a in Terms of 1-D Peak Capacities. When the volume Vnof an n-dimensional space is an orthotope (e.g., a rectangle in a 2-D separation, a parallelepiped in a 3-D separation, etc.), then Vn can be expressed as the product of the lengths X of each dimension of separation% n

By combining eq 23 with eq 9b, one can show that (/3do) a=

-

mrnJ2

-

where a l ~=j is the 1-D saturation along the jth axis of separation and n*q = &/bdo is the 1-D peak capacity of the kth axis of separation (k # j). Equation 24 relates the overall saturation of an n-dimensional separation to the saturation ( Y ~ Dof any one dimension and the peak capacities of the other dimensions or to the number d of components and the 1-D capacities of the n orthogonal axes. The usefulness of eq 24 arises from the ease with which one thinks about capacities in a single dimension, as opposed to higher dimensions. By comparing eqs 8 and 24, one concludes (25) Thus, the overall spot capacity is not the product of the n 1-D peak capacities, as one commonly assumes, but is related closely to this product by a geometrical factor characteristic of the n-dimensional space. This factor is a number slightly larger than 1; e.g., 4/* for n = 2,6/r for n = 3, and 32/1r2for n = 4. The distinction simply arises from how capacity is defined. The volume of space that cannot be occupied by componentsin a closest packed configuration is excluded from the classical definition of capacity, whereas it is included in the statistical definition of capacity.' Figure 5 is a graph of the ratio plfi vs log f i for various capacities n*, in 1-D, 2-D, 3-D, and 4-D separations. The

ratio pld for a 1-D separation was calculated from the expression11exp(-alD) = exp(-fi/n*,,). The saturations a of 2-D, 3-D, and 4-D spaces were calculated from eq 24, and the corresponding ratios plfi were calculated from eq 11. In Figure 5a, the capacity n*q equals 100 in all four dimensions. If one arbitrarily requires that 90% or more of components be resolved as spots under these conditions, then 1-D, 2-D, and 3-D separations would be capable of resolving up to 10, 650, and 46 500 components, respectively (the estimate for 1-D separations is only approximate, as few components are involved). Of these components, approximately 81-82 % or more would be singlets. Under the same conditions, 4-D separations would be capable of resolving more than one million components, as indicated by the figure. One will not have necessarily the same capacity in every dimension. In comprehensive 2-D separations, for example, the time of separation is muchshorter in the second dimension than in the first, and the capacity of the second dimension typically is less than that of the first. Presumably, similar constraints will exist on n-dimensional separations, when implemented in a comprehensive mode. Figure 5b is a graph analogous to Figure 5a, except that the capacity is reduced by a factor of 2 in each succeeding dimension (the choice of 2 is arbitrary). In other words, n*,, = 100, n*, = 50, n*c, = 25, and n*y = 12.5. These reductions correspond to a 2-, 8-, and 64-fold loss in total capacity, respectively, compared to the case where n*q = 100 in each dimension. If one again required that 90% or more of components be resolved as spots, then 2-D, 3-D, and 4-D separations would be capable of resolving up to 310, 6050, and 61700 components, respectively (the 1-D separation does not change). Of these, approximately 8142% or more would be singlets, as before. Yet one sees clearly the dramatic reduction in the number of resolvable components, relative to the case where n** = 100 in each dimension. Therefore, the need exists to have the capacity of all dimensions as large as possible. Comparisonof Separations in nand n + 1 Dimensions. Rather than relate a to a l and ~ one or more 1-D peak capacities, one may wish to relate the saturation of an n-dimensional space to the saturation of an (n + 1)dimensional space and the capacity n*,, of the (n + 11th dimension of this space. Such arelationship would be useful, if an n-dimensional separation were inadequate for one's needs and one desired an estimate of the capacity n*,, necessary to achieve separation in an (n + 1)-dimensionalspace. Lest this seem like a fanciful issue, an increase in the number of dimensions of the separation space by the number 1could be accomplishedrather easily,if comprehensiveseparations were used. From eq 9b, one can calculate that

where z symbolizes this ratio and subscripts have been introduced to distinguish between the dimensions of space. Here, n is associated with the space of lower dimension. The ratio of the number of spots in an (n + 1)-dimensional separation to that in an n-dimensional separation can be calculated from eqs 11 and 26 and is

Equation 27b provides a useful guideline for estimating the capacity n*,, needed in the (n + 1)th dimension, when the saturation of the n-dimensional space is an. The limit of


3-D

0

3-D

1

I

I

I

1

0.1

0.2

0.3

0.4

0.5

a

a.

0.05 r

4- D

P

0

0.05

0.1

a

2021

0.15

0.2

0.25

a

Flguro 4. Graphs of p l q , sln,, dln,, and tln, vs a for 3-D and 4-0 separations. The curves for plnc were calculated from eq 17; the curves for Sin,, dln,, and tin, were calculated from eq 20a with Y = 1, 2, and 3, respectively.

OOv

log Tic

ib

2b

15,

1

I)

%

3b 4b

510

pliii 0.8

''1

0.6

0.4

= 0.375

0.2

0 v

0

0

I

10

I

I

20

30 "cr

Flgwo 5. Graphs of plm vs log m for various capactties ne4 in 1 4 , 2-0, 3-D, and 4-0 separations. The ratio plm for a 1-D separation was calculated from the expression exp(-ald = exp(-m/n*c,). The saturations a of 2-D,3-D, and 4-D spaces were calculated from eq 24; the corresponding ratios plm were calculated from eq 1 1. I n (a), ail no4k equal 1 0 0 In (b), nocr= 100, noY = 50, n*25, and n*,, 12.5. f

this equation, as z approaches zero or, equivalently, as n*,, approaches infinity, is the constant, Ihlpn. Hence, P n + l l P n approaches alp, and pn+lapproaches a. This limit is unsurprising, because the saturation an+lapproaches zero as n*,, approaches infinity. Because pn+dpnapproaches a constant for sufficiently large n*,,, plateaus in the graph of Pn+llPnVBn*,, for various ads correspond to nearly complete

I

40

I

50

+

Flguro 8. Graphs of p,,+,lpn vs the capacity noh, in the (n 1)th dimensionof an (n 1).dlmenslonal separatbn for varbus rrdlmensknal saturations an. Curves were calculated from eq 27b. I n (a), n = 2; in (b), n = 3.

+

separations in an (n + 1)-dimensionalspace. Such graphs are particularly useful, because they visually indicate the regions beyond which increasing capacity has little effect. Figure 6 is a graph of pn+l/pnfor n = 2 (upper figure) and n = 3 (lower figure) vs the capacity n*,, in the (n + 11th dimension of an (n + 1)-dimensionalseparation. A similar graph for n = 1 is reported elsewhere.8 The curves were calculated from eq 27b for the various ads in the figure. As observed above, the plateaus correspond to nearly complete separation in an (n + 1)-dimensionalspace. For example, as

2022


shown in Figure 6a, a mixture partially resolved by a 2-D separation having a saturation of a2 = 0.5 (for which pzlm would be only 0.313) would be almost resolved by a 3-D separation if n*cawere greater than 10 or so. In contrast, n*ca would have to be 40 or more if the 2-D saturation were 1.0 (for which palm would be only 0.075). Similar trends are observed in Figure 6b; a mixture partially resolved by a 3-D separation with a saturation of a3 = 0.375 (for which pdm would be only 0.157) would be almost resolved by a 4-D separation if n*,, were greater than 20 or so. Internal Consistency. Equation 26 can be exploited to demonstrate an internal consistency in the theoretical approach adopted here. The ratio of the number of singlets in an (n + 1)-dimensionalspace to that in an n-dimensional space is

This ratio must be greater than or equal to 1 on physical grounds, because one cannot have fewer singlets in an (n + 1)-dimensionalspace than in an n-dimensional one if the separative displacements truly are orthogonal. A similar argument has been presented elsewhere for 1-D and 2-D separations.3 The ratio is greater than 1when z is less than 112 or when

where the inequality has been evaluated from eq 26. This number is quite small. For example, the number of singlets in a 3-D separation will exceed that in a 2-D separation if n*,, > 413, and the number of singlets in a 4-D separation will exceed that in a 3-D separation if n*, > 3~18.A similar treatment reported elsewhere3 shows that the number of singlets in a 2-D space will exceed the number of singlets in a 1-D space if n*% > ~ 1 2 In . all cases, the additional capacity needed is no larger than ul2 units and becomes progressively smaller with increasing n. The progressive decrease is easy to rationalize;a partial separation in the previous n dimensions reduces the need for capacity in the (n + 1)th dimension. This reduction of need is the principal reason that nonintegral values of capacity less than 2 units are sufficient for separation. Similar trends are found for the ratio pn+llpnin eq 27b, which also exceeds the value 1 when z C '12 or when eq 29 holds. A similar treatment reported elsewhere3 shows that the number of spots in a 2-D space will exceed the number of peaks in a 1-D space if n*,* is greater than a value between u/2 and u;the actual value depends on the saturation of the 1-D space. In all cases, the additional capacity needed is no larger than A (and typically much less) and decreases with increasing n. These results provide a satisfying check for the statistical theory proposed here. One intuitively does not expect that much capacity should be needed in the (n+ 1)th dimension of an (n + 1)-dimensionalseparation to realize more singlets and spots than in an n-dimensional separation. The actual values of capacity must be determined by theory but are always u orJew, a small value. If these numbers were large, one would have reason to question the theory. Comparison of Theory to That for 1-D Separations. The expected number of peaks in a 1-D separation is11 p = me"

(30a)

which differs substantially from eq 11, when n = 1 p = 2am-

e-2a 1- e-2p

(30b)

Surprisingly, however, eq 30b differs from eq 30a only by about -4.0%, as long as a C 0.5 or so, the threshold value above which peak amplitudes can no longer be neglected in 1-D separation^.^^^^ At higher saturations, eq 30b substantially underestimates eq 30a. Similar discrepancies between the theory proposed here and that for 1-Dseparations are found, when functions other than p are examined. For n = 1, eq 22 predicts the correct value of (sln,), for a 1-D separation, but the equations for (pn,vlm)max (eq 16), @In,), (eq 191, and (Pn,vlnc), (eq 20b, with x values reported in Table I) do not agree with their 1-D counterparts. It is not surprising that eq 22 reduces to the correct result for n = 1, since only the exclusion of nearest neighbors from overlap is considered in a theory of singlets. The reason that eq 30b does not apply to 1-D separations requires a brief review of the 2-D theory proposed by Roach.16 One first envisions the arbitrary selection of any point in a 2-D plane containing randomly distributed points. One then finds the nearest neighbor to that point, then the nearest neighbor to either of these two points, and then the nearest neighbor to any of these three points, etc. As long as these nearest-neighbordistances are less than @do,then the points belong to the same cluster, which one interprets as a spot. The first sequential nearest-neighbordistance that is greater than Bd, spatially isolates the cluster from the other points in the plane. In Roach's theory, any of the points in a cluster has the same likelihood of being the point that is closest to the next nearest neighbor. This outcome is possible, because neither the points in the cluster nor the nearest neighbor is spatially ordered in any particular manner. Indeed, this lack of spatial order enabled the author to extend this fundamental idea to higher dimensionsof space, as discussed in this paper. This lack of order clearly is absent, however, in the 1-D equivalent to this problem. There, points are ordered along a straight line, and only the fiist and fiial points in a cluster can be closest to neighbors not in the cluster. Equation 11 consequently cannot correctly model overlap, when n = 1. This is somewhat regrettable from an esthetic perspective; one would prefer a unified theory of overlap.

CONCLUSIONS The utility of this theory awaits the test of time. Unlike ita 1-D and 2-D predecessors, it is not a theory for which an immediate application exists. Indeed, its impact will probably depend on the reader's perception of overlap in chemical separations. Some readers will believe the author has engaged in a mathematical flight of fancy by analyzing the prowess of separation systems that, for all practical purposes, do not exist today. Other readers-and the author hopes they outnumber the former-will better appreciate the phenomenal capabilities of high-dimensional separations. Furthermore, the author believes it would be criminal not to share this work with the scientific community. The theory of Roach can be generalized to higher dimensions too easily to ignore the generalization. The theory derived here quantifies the incredible power of 3-D and 4-D separations. Literally tens or hundreds of thousands or even millions of components can be resolved with a high probability under the appropriate conditions, even when statistical limitations apply. Under these conditions, the need for hyphenatedlchemometric approaches to identification and quantification-the principal objectives of the analytical chemist-could be substantially reduced in separation science and might, for all practical purposes,


approach nil. One could begin to rely on the separation alone to provide the signals necessary for analysis. The author (some might say, conveniently) has not addressed several technological and scientific obstacles to the implementation of n-dimensional separations. A minor technological concern is the large storage capacities and highspeed computers required to store and process signals in an n-dimensional space. Presumably, this concern can be addressed. A more serious technological obstacle to implementing separations in a 3-D porous or gel-like cube is the detection of componenta. Comprehensive n-dimensional separations seem more practical; however, a scientific obstacle to implementing them is the wide range of separation times expected among the different dimensions. Such a wide range might be incompatiblewith high efficiency in each dimension. These concerns and others are immensely important and ultimately may limit what can be achieved. But without the proper incentive, as provided by theory, one might never seek to overcome these hurdles. The author

2023

hopes this study provides a justification to pursue highdimensional separations. Theory, even in the simple form developed here, indicates that rewards definitely await those who choose to pursue them.

ACKNOWLEDGMENT This work was supported by the National Science Foundation (CHE-9215908).

Note Added in Proof. Since acceptance of this article for publication the author learned that an expression equivalent to s/m for a 3-D space was reported previously by Michel Martin (Martin, M. Proc. Congr. Mesucora 91 1991, 1, 3).

RECEIVED for review December 28, 1992. Accepted May 12, 1993.

Statistical theory of spot overlap for n-dimensional ... - ACS Publications

Recommend Documents