Decoding Two-Dimensional Complex Multicomponent Separations by

function is symmetrical to p ) 0. fh is the average value of the chromatographic signal. ... (10) σh. ) ∑ i. (hi. - ah)2 m - 1. (11) c(Δx) ). AT. ...
0 downloads 0 Views 395KB Size
Anal. Chem. 2004, 76, 3055-3068

Decoding Two-Dimensional Complex Multicomponent Separations by Autocovariance Function Nicola Marchetti,† Attila Felinger,‡ Luisa Pasti,† Maria Chiara Pietrogrande,† and Francesco Dondi*,†

Department of Chemistry, University of Ferrara, via L. Borsari, 46, I-44100 Ferrara, Italy, and Department of Analytical Chemistry, University of Veszpre´ m, P.O. Box 158, H-8201 Veszpre´ m, Hungary

A new method for decoding two-dimensional (2D) multicomponent separations based on the use of the 2D Autocovariance function (2D-ACVF) has been developed. Theoretical models of single component (SC) spot distributions in 2D separations, both random and structured, are developed as the basis for a nonlinear estimation of both sample and separation system parameters from experimental 2D separations. The number of SCs, the average spot size, the spot capacity, and the saturation factor can be evaluated in the case of random SC spot patterns. The procedure was validated by extensive numerical simulation under conditions close to those usually found in GC × GC or 2D-polyacrylamid gel electrophoresis of proteins. The worse precision degree was no greater than 10% in the case of maximum spot density. This imprecision was fully accounted for, and it seems acceptable owing to the intrinsic statistical character of the estimation method. Structured multicomponent 2D separations, where SCs are linked by linear relationships, give rise to specific structured patterns in 2D-ACVF plots from which the parameters (phase and frequency) of the structured SC sequences can be evaluated: the study of 2D-ACVF makes it possible to decode multicomponent 2D separation, that is, to determine the number, relative abundance, and structural similarities of the single components. Pertinent expressions of the theoretical 2DACVF were derived for simple cases, and a procedure for decoding cases of structured 2D separations was developed and applied. It was shown that 2D separations containing both random and structured patterns of SC spots give rise to 2D-EACVF, which is the superimposition of the two component parts. This feature allows one, in principle, to decode the two components. The relevance of these results for Giddings sample dimensionality and separation dimensionality and their effective experimental evaluation is discussed. Today the analysis of complex multicomponent mixtures, containing as many as 5000 single components (SCs) (see the Glossary), is a challenging task for frontier research fields, such as proteomics or metabolomics, as well as for traditional research † ‡

University of Ferrara. University of Veszprem.

10.1021/ac035312+ CCC: $27.50 Published on Web 04/28/2004

© 2004 American Chemical Society

fields such as polymer, natural product and food chemistry. However, mixtures are often so complex in terms of number of and similarity between SCs that the separation power offered by a single dimension (1D) separation technique is not enough: separation science has therefore entered into the era of multidimensional separations,1 even going beyond such 2D approaches as LC/LC, GC/GC, etc.2 Moreover, to fully assign the chemical structure to the separated component, hyphenation between separation and spectroscopic techniques, NMR,3 or spectrometric techniques such as MS and tandem MS (MSn) have been established. To fully exploit these multidimensional hyphenated techniques, there is evident need for appropriate management of the tremendous amount of analytical chemical information produced. There are two main concerns when designing and performing analysis of a multicomponent mixture. The first consists of the a priori determination of the appropriate means to achieve a given degree of separation/identification. The second is how to best evaluate a posteriori, on the basis of analytical results, all hidden attributes of the multicomponent mixture: number of SCs, (m), their abundance, identity, and structure. From the separation point of view, these questions and their answers fall within the science of multicomponent separations, a field of separation science which substantially started with the fundamental papers by Davis and Giddings.4,5 These authors approached these problems with statistical modeling, since both the number of single components is great, and their appearance in the separation space is often random. The so-called statistical model of overlap (SMO) firmly established that in a chromatogram where the SC retention is completely random (i.e., Poissonian), the overlapping pattern (i.e., the fraction of SCs, or pure peaks, of doublet, triplet, n-plet SCs peaks) is severe, and the peak capacity (see the following sections) needed to ensure full mixture resolution can be as high as 10 times m. These conclusions had a deep impact on and a seminal value in separation science, emphasizing from a fundamental point of view the need for a multidimensional approach to complex mixture separation. (1) Giddings, J. C. In Multidimensional Chromatography; Cortes, H. J., Ed.; Marcel Dekker: New York, 1990; pp 1-28. (2) Schoenmakers, P.; Marriott, P.; Beens, J. LC-GC Eur. 2003, 16, 335339. (3) On-line LC NMR and Related Techniques; Albert, K., Ed.; Wiley: Chichester, 2002. (4) Davis, J. M.; Giddings, J. C. Anal. Chem. 1983, 55, 418-423. (5) Davis, J. M.; Giddings, J. C. Anal. Chem. 1985, 57, 2168-2177.

Analytical Chemistry, Vol. 76, No. 11, June 1, 2004 3055

In certain cases, assuming the retention pattern of multicomponent separation as completely random may be inappropriate. In fact, real mixtures, however complex they are, very often contain recursive structure relationships among the SCs, and thus, the retention pattern should contain some hidden correlations between SC retention positions. This can have a great impact on the degree of separation; neglecting it, one loses important information, precious for characterizing the mixture. From the very outset, the Fourier method of representing multicomponent chromatograms6-12 accounts for the existence of correlations in retention positions and how to determine them a posteriori. A simplified version of it is based on the autocovariance function (ACVF). In this approach, peak shape and SC abundance distribution specificity are considered, and a method for determining m, the structure-retention correlations, and peak shape features hidden in a complex multicomponent chromatogram was developed.6 In this way, the complex structure of a multicomponent chromatogram is interpreted in terms of the two basic components: random and structured ones. They are described by their fundamental properties: number of components and specific retention recursivity correlations. We shortly define these results as “decoding” of the complex multicomponent separation. The decoding properties of the ACVF method are further enhanced when coupled with selectivity features of MS detection, as shown in the case of GC/MS chromatograms of polychlorobiphenyls.7 There is, thus, more than one reason to exploit the Fourier/ACVF method in multidimensional separation. Correlation in retention is also deep-rooted in the concepts of sample dimensionality developed by J. C. Giddings in an effort to establish the proper relationship between sample complexity and the most suitable dimensionality of the separation system.13 Consequently, development of suitable data handling which takes into account the structure-retention correlation in multidimensional/ multicomponent separation appears not only suitable but also very promising in multidimensional separation method development and, ultimately, for multidimensional identification technology, for example, for proteomics.14 This paper investigates the role of the retention-structure correlation in 2D multicomponent separation and develops the 2D-ACVF method. The exact expressions for 2D-ACVF for the cases of both random (i.e., Poissonian) and structured retention patterns are derived. A method for a posteriori estimation of the SC number, peak capacity, and of the parameters of the sample dimensionality will be developed. (6) Felinger, A.; Pasti, L.; Dondi, F. Anal. Chem. 1990, 62, 1846-1853. (7) Pietrogrande, M. C.; Pasti, L.; Dondi, F.; Bollain Rodriguez, M. H.; Carro Diaz, M. A. J. High Resolut. Chromatogr. 1994, 17, 839-849. (8) Felinger, A.; Pasti, L.; Reschiglian, P.; Dondi, F Anal. Chem. 1990, 62, 18541860. (9) Felinger, A.; Pasti, L.; Dondi, F. Anal. Chem. 1992, 64, 2164-2174. (10) Dondi, F.; Pietrogrande, M. C.; Felinger, A. Chromatographia 1997, 45, 435-440. (11) Dondi, F.; Betti, A.; Pasti, L.; Pietrogrande M. C.; Felinger A. Anal. Chem. 1992, 65, 2209-2222. (12) Pietrogrande, M. C.; Dondi, F.; Felinger, A. J. High Resolut. Chromatogr. 1996, 19, 328-332. (13) Giddings, J. C. J. Chromatogr., A 1995, 703, 3-15. (14) Righetti, P. G.; Stoyanov, A. V.; Zhukov, M. Y. The Proteome Revisited; Elsevier: Amsterdam, 2001.

3056

Analytical Chemistry, Vol. 76, No. 11, June 1, 2004

THEORY 1D Structure-Retention Correlation and 1D-Autocovariance Function. A multicomponent mixture can be considered an ensemble of objectssthe different solutessat different abundance levels. Two main limiting structures can be conceived in this multicomponent mixture ensemble: one made up of those components exhibiting structural similarities which will produce correlated or recursive retention positions (structured component) and one made up of components exhibiting a random structural pattern reflected in random retention positions (random component). In what follows, both these limiting cases are considered together with the retention patterns determined in 1D or 2D separation. In the structured component of the ensemble mixture, some retention-structure relationships are assumed to exist between the members of the ensemble, that is, the solutes, and this is reflected in their partition free energy. For example, let us consider a homologous series generally indicated as i-series characterized by a given structural increment, for example, a CH2 and a 1D separation system. The partition free energy of the nth component of the series is

∆µi(n) ) ai + bin

(1)

where ∆µi(n) represents the partition free energy of the nth term of this i-homologous series, and ai and bi, the phase and the frequency indicators (called simply phase and frequency) of the i-series, respectively.13 In a 1D separation, for example, under good elution programming conditions, the homologous series characterized by eq 1 produces a retention sequence given by the same eq 1, but scaled by a constant factor, since retention is proportional to partition free energy. In eq 1, bi can represent, for example, the effect of a CH2 addition, whereas ai represents the contribution of the i functional group to overall partition free energy. Note that in eq 1, the partition free energy was considered, but its role there and in the following can just as well be replaced by any other property proportional to retention (e.g., electrophoretic mobility, isoelectric point, etc.). A given ensemble mixture can contain several homologous series (i ) 1, ..., r in eq 1). The resulting chromatogram of this mixture is characterized by peak overlapping depending on the performance of the separation and mixture features and, in particular, on SC peak width, the number r of homologous series, the (ai, bi) values, and the way in which they combine. As the number r of homologous series in the ensemble mixture increases, the 1D separation becomes increasingly crowded and, thus, unable to decode the mixture. Not only does peak overlapping increase but also the existence of the recursive features ai and bi are progressively lost.13 When the number of sequences r is great enough it can be proven that overall sequence becomes Poissonian, that is, the interdistances between subsequent SC position are exponentially distributed,

f(b) )

1 b exp bh bh

( )

(2)

with average value bh given by6,16

1 bh



i)r

1

i)1

i

∑b

(3)

The Poissonian character of a multicomponent chromatogram, assumed by Giddings and Davis in building up their SMO is thus the limit condition to which the superimposition of many ordered sequences converges. Under these limit conditions, the ordered structure is completely lost. The Poissonian pattern is thus not only the case of a random component of a mixture ensemble, but also it is the limit random component, that is, the most disordered one, with the maximum distribution entropy value.17 Between a limit case of ordered mixture expressed by eq 1 and the limit random Poisson component (eqs 2 and 3) lies the entire wealth of different multicomponent mixtures we can find or produce. It is for this reason that we have decided to focus the discussion, basing it on the two assumed components: the structured and the Poissonian-random components. In the present paper, we will exploit the ability of the autocovariance function, ACVF, computed over multicomponent/multidimensional separations, to single out the two components, structured and random, in practice (i.e., a posteriori). 1D-ACVF will be briefly introduced in order to properly introduce 2D-ACVF. From the digitized 1D-chromatograms fi, ∀ i ) 1, 2, 3, ..., N, with N - 1 ) X/τ and τ the time interval between the subsequent digitized positions, the experimental ACVF (EACVF) in 1D, here named 1D-EACVF, is computed as8

cp )

1

N-p

∑ [f - hf ][f M i

i+p

- hf ] p ) 0, 1, 2, 3, ..., M - 1 (4)

i)1

which establishes the equality between the areas over single time step increments and the 1D-EACV coordinate. In the case of completely random SC positions (Poissonian pattern, interdistances distributed according to eq 2) and Gaussianshaped SC peaks,

[

f(x) ) h exp -

hf )

i)N

N

i)1

∑f

i

(5)

Let us consider the time interdistance, ∆x.

∆x ) pτ

(6)

c(∆x) )

(7)

(15) Giddings, J. C. Unified Separation Science; Wiley: New York, 1991. (16) Feller, W. An Introduction to Probability Theory and Its Applications, 2nd ed.; Wiley: New York, 1971; Vol. II, pp 370, 476, 625. (17) Dondi, F.; Bassi, A.; Cavazzini, A.; Pietrogrande, M. C. Anal. Chem. 1998, 70, 766-773.

]

(8)

σh2

2 2σxxπXm ah

+ 1 e-(∆x) /4σx 2

2

(9)

where ah is the mean SC peak height value, and σh2 is the variance of the SC peak heights,

∑h ah )

x

i

i

(10)

m

∑(h - a )

σh )

2

h

i

i

(11)

m-1

and AT is the total area of the chromatogram. When there is a correlation between retention positions, as expressed by eq 1, 1D-TACVF11 can predict deterministic peaks located at repeated positions values, bk.

c(∆x) )

AT2

( )∑ σh2 2

2σxxπXm ah

k)∞

+1

2

2

e[-(∆x-bk) /4σx ]

(12)

k)0

Equation 12 expresses the “deterministic” part of ACVF, resulting from nonrandom repeated positions, for example, the ACVF of the chromatogram of a homologous series of compounds. Equation 12 refers to an infinitely long sequence of equally spaced SC peaks. In Appendix I reported in the Supporting Information (SI), the pertinent expression (eq I-41) for the “deterministic” part of 1D-ACVF in the case of an ordered sequence of SC peaks made of nmax + 1 elements expressed by eq 1 is derived.

k)nmax

∑ k)0

c(∆x) × τ ) cp × 1

( )

AT2

C(∆x) ) It is possible to obtain c(∆x), which is 1D-EACVF, as a function of time interdistance by passing through the equation

2σx2

where h, jx, and σx are the height, mean location, and standard deviation of a SC peak, the theoretical expression of 1D-ACVF (1D-TACVF) is6

where p and M - 1 are interdistance extension and its maximum value in which 1D-EACVF is computed, respectively. Note that only positive interdistances p are considered, since the ACVF function is symmetrical to p ) 0. hf is the average value of the chromatographic signal.

1

(x - jx)2

AT2

( ) σh2

2 2σxxπX(nmax - k + 1) ah

+ 1 e-[(∆x-bk) /4σx ] (13) 2

2

Equations 9 and 12 and, thus, eq 13 were derived from the power spectrum of the multicomponent chromatogram.6,11 This procedure is complex and cannot be immediately applied to 2D separations. In Appendix I of the Supporting Information, a general Analytical Chemistry, Vol. 76, No. 11, June 1, 2004

3057

procedure18 for rederiving eq 9 is presented which has the advantage that it can be simply extended to 2D or even nD case. Moreover, a discussion concerning eq 12 and its extension to 2D under selected cases is reported in the same Appendix I of the Supporting Information. 2D Structure-Retention Correlation and 2D-Autocovariance Function. If a mixture is analyzed in two dimensions, each SC will be represented in a two-dimensional space. Let us first consider the 2D retention pattern determined by the structured part of the mixture ensemble. Each series i is now characterized by two independent equations

∆µX,i(n) ) aX,i + bX,in

(14)

∆µY,i(n) ) aY,i + bY,in

(15)

and

evaluate the degree to which the separative medium is suitable for the mixture complexity. The 2D saturation factor,19 similar to the 1D analogue is defined as

R2D )

where ∆µX,i(n) and ∆µY,i(n) are the partition free energies of the nth member of the i-series along the X and Y separation axes, respectively. The separation space is equal to

A A0

(17)

This parameter, nc,2D is named “spot capacity”, and it is analogous to the peak capacity for 1D separation.19 Its definition is general, and it can be applied to both circular or elliptical spots. The bivariate Gaussian distribution,

[

f(x, y) ) h exp -

(x - jx)2 2σx2

-

(y - jy)2 2σy2

]

(18)

where jx and jy are the coordinates locating the spot center; h, the spot amplitude; and σx and σy are the standard deviations along both separation axes and assumed constant for all SCs, can describe a circular spot (σx ) σy) as well as an elliptical one (σx * σy) with different ratios between major and minor semiaxes. This last condition corresponds to 2D separations under varying efficiency conditions of the two separation axes. In this case, the area A0 is defined as

A0 ) 4πσxσy

(19)

In this way, the spot capacity is a well-defined parameter used to (18) Solnes, J. Stochastic Processes and Random Vibration: Theory and Practice; Wiley: Chichester, 1997. (19) Davis, J. M. Anal. Chem. 1991, 63, 2141-2152.

3058

Nx-p Ny-q

1 N x Ny

∑ ∑ (f i)1

i,j

- hf )(fi+p,j+q - hf )

(21)

j)1

p ) - Mx, ..., - 1, 0, 1, ..., Mx

(22)

q ) - My, ..., - 1, 0, 1, ..., My

(23)

(16)

The first introduced approximation regards all separated zones, corresponding to pure components of the sample, each having the same effective area A0. The maximum number of components we can isolate is

nc,2D )

(20)

Both of these parameters (nc,2D and R2D) are independent of the spatial distribution of spots. As in 1D separation, it is assumed to have a digitalized map consisting of a gridded surface, Nx × Ny, where all the nodes are equally spaced. fi,j represents the map intensity at the point (i,j), and hf is the average intensity calculated over all of the sampled points. The experimental 2D-ACVF (2D-EACVF) is computed as

Cp,q )

A ) XY

m nc,2D

Analytical Chemistry, Vol. 76, No. 11, June 1, 2004

where (Mx and (My are, respectively, the p and the q maximum lags over which 2D-EACVF is calculated. Equation 21 is the natural extension of eq 4 to the 2D case, but here, the full space is considered. The symmetry properties of 2D-EACVF will be handled in the Discussion Section. The theoretical model expression for 2D-ACVF (2D-TACVF) of a 2D separation whose SC spots are represented by a bivariate Gaussian distribution (see eq 18) and the whose SC positions are Poissonian is (see Appendix I in SI for the derivation)

C(∆x, ∆y) )

[

]

VT2(σh2/ah2 + 1) (∆x)2 (∆y)2 exp 4πmσxσyXY 4σx2 4σy2

(24)

The structured part of the ensemble mixture will produce a structured 2D spot distribution if there is a proper linear relationship between the two retention axes and the two-component partition free energies (eqs 14 and 15). In this case, the 2D-EACVF will exhibit deterministic peaks located at repeated interdistances bx, by, in a manner similar to what is discussed for the 1D case (see eq 12). The corresponding 2D-TACVF for a single finite sequence of nmax SC ordered spots is (see Appendix I of SI)

VT2

k)nmax

C(∆x, ∆y) )

∑ k)0

4σxσyπXY(nmax - k + 1)

( ) σh2 2

+ 1 e-[(∆x-bXk) /4σx ]-[(∆y-bYk) /4σy ] (25) 2

2

2

2

ah

The case of two parallel sequences, 1 and 2, of SC ordered spots

(see Appendix I, SI) gives rise to an additional spot train in 2DACVF of the form

VT2

k)nmax

C(∆x, ∆y) )



( ) k)0

σh2 2

4σxσyπXY(nmax - k + 1)

+ 1 e-[(∆x-∆aX-bXk) /4σx ]-[(∆y-∆aY-bYk) /4σy ] (26) 2

2

2

2

and by rearranging, one has

() σh2

σh2

ah2 est ah2 mest - m ) m σh2 +1 ah2

(34)

ah

where (σh2/ah2)est is one estimate obtained, for example, according to eq 31 or 32.

where

∆aX ) a2,X - a1,X

(27)

∆aY ) a2,Y - a1,Y

(28)

Equation 26 expresses only one branch of the 2D-ACVF (see Appendix I in the Supporting Information and the Discussion Section). The correspondence between digitalized and continuous 2D-ACVF can be obtained by employing the conditions

C(∆x,∆y)τ2 ) Cp,q

(29)

and, in addition to eq 6 referred to ∆x, the following equation

∆y ) qτ

(30)

Equation 29 is the 2D analogue of eq 7. Decoding 1D and 2D Separations by Using 1D-TACVF and 2D-TACVF. Equations 9, 12, 13, and 24-26 provide the basis for determining relevant parameters of complex multicomponent 1D and 2D separations, respectively, by nonlinear least-squares fitting of 1D- or 2D-TACVF to EACVF. However, we observe that because of SC peak overlapping, the relative dispersion of the SC abundance (σh/ah) appearing in eqs 9, 12, 13, and 24-26 is not experimentally accessible and must be approximated by an estimate, for example, by the relative dispersion of the observed peak maximums (σm/am)8

σh σm ≈ ah am

(31)

σh σ V ≈ ah aV

(32)

or

where σV/aV is the spot volume dispersion. One can estimate the error in the m estimate coming from a bias on σh2/ah2. From eq 24, one can see that

m∝

σh2 ah

2

+1

(33)

COMPUTATION All the programs are written in Fortran and run on a personal computer Pentium III 2-Ghz (512-MB RAM) AMD Athlon. The first part of the program provides a map generation obtained as follows. A random number generator with a uniform probability distribution derived from literature20 gives uniform deviates, that is, random numbers belonging to a given interval where the probability distribution function associated with any number in the interval is the same as that for the others, that is, uniform (U) distribution (rectangular pulse). The transformation method20 allows us to obtain other random series with specific probability distributions, starting from the U distribution. In particular, exponential (E) deviates occurring between independent Poissonian events have been generated (see eq 2). By this method, it is possible to randomly create the coordinates xi, yi and the abundances hi of each SC spot i ) 1, m. In this work, three different spot abundance distributions have been taken into account. The first is the constant (C) deviates, where all spots have the same abundance, then uniform (U), and exponential (E). The separation map simulation is completed by generating spot concentration profiles over the entire separation space on the basis of eq 18. The separation space is covered with a grid, where the user can choose the number of points for each dimension according to the desired resolution. Then the intensities associated with every node in the grid are calculated as contribution of each spot in the map for that node. The second part of the program regards the calculation of some experimental parameters related to the simulated separation: A (see eq 16), A0 (see eq 19), nc,2D (see eq 17), and R2D (see eq 20). The total volume of the separation, VT was computed by numerical integration. The true average SC abundance, ah and its standard deviation σh were computed according to eqs 10 and 11, respectively, by using the true values of SC abundances. The true relative dispersion ratio, that is, (σh2/ah2), was then computed. As discussed in the previous section, the true value of the dispersion ratio is not accessible from the experimental map. To focus the relevance of this point for the present approach, three different methods were considered. Method I: the true theoretical value, (σh2/ah2), computed as discussed above, is employed in eq 24 (see Table 1). Method II: The pmax peak maximums (hm,j, ∀ j ) 1, pmax) of the map were detected, by using an algorithm based on the comparison of seven successive points for each dimension. When the first three were increasing and the last three decreasing, a (20) Press, W. H.; Teukosky, S. A.; Vetterling, W. T.; Flannery, B. P., Numerical Recipes in Fortran, 2nd ed.; Cambridge University Press: Cambridge, U.K., 1992.

Analytical Chemistry, Vol. 76, No. 11, June 1, 2004

3059

maximum in the center (fourth point) was detected. A star search design was thus used to explore the whole separation surface, first along the first dimension and then moving along the other. In this way, all the hm,j values were determined. The average peak maximum abundance

∑h am )

m,j

j

(35)

pmax

and its standard deviation

x

∑(h

σm )

m,j

- am)2

j

pmax - 1

(36)

were computed, and from these, the relative dispersion ratio of the maximums, (σm2/am2), was obtained and substituted in eq 24 in the place of the true theoretical values, (σh2/ah2). Method III: a dedicated software, Melanie II21,22 (Geneva BioInformatics, GeneBio S.A.), was employed to measure the Vvolumes Vm,j of the detected pmax spots and the relative dispersion ratio of spot volume, (σV2/aV2) by using expressions similar to eqs 35 and 36, obtained by replacing hm,j and am with Vm,j and aV, respectively, and σm with σV. This value is substituted in eq 24 in place of the true theoretical value, (σh2/ah2). The third part of the program involves numerical calculation of the 2D-EACVF from the above digitalized map, according to eq 21. This procedure was improved by using the cyclic calculation of ACVF.23 In this way, each point of the 2D-EACVF is calculated using the same number of points (see ref 23 for the details), and thus, each point of the EACVF plot is computed with the same degree of precision. The fourth part of the program obtains an estimate of the separation parameters m, σx, and σy by nonlinear least-squares fitting 2D-EACVF to 2D-TACVF (eq 24). The nonlinear least squares Levenberg-Marquardt algorithm23 was employed. Taking into account the experience reached with the analysis of 1Dchromatographic separation by 1D-ACVF,8,9,11 it is convenient to limit the fitted 2D-EACVF to the interval (4σ. The reason for this condition lies the fact that this region of 2D-EACVF is not affected by random deviation of the Poisson process.6,8-11,23 For a 1500 × 1500-point grid, numerical computation of the 2D-EACVF map according to eq 21 requires ∼6 to 7 h of computation time on the PC referred to above. RESULTS AND DISCUSSION Random 2D Separations and Their 2D-EACVF. Equation 24 indicates that 2D-EACVF, computed on a random-Poissonian spot distribution, is expected to be related to m and to the size of a SC spot (σx, σy). To throw light on these features, two random (21) Appel, R. D.; Palagi, P. M.; Walther, D.; Vargas, J. R.; Sanchez, J. C.; Ravier, F.; Pasquali, C.; Hochstrasser, D. F. Electrophoresis 1997, 18, 2824-2834. (22) Appel, R. D.; Palagi, P. M.; Walther, D.; Vargas, J. R.; Sanchez, J. C.; Ravier, F.; Pasquali, C.; Hochstrasser, D. F. Electrophoresis 1997, 18, 2835-2848. (23) Felinger, A. Data Analysis and Signal Processing in Chromatography; Elsevier: Amsterdam, 1998.

3060

Analytical Chemistry, Vol. 76, No. 11, June 1, 2004

2D separations with different numbers of SCs have been simulated and reported in Figure 1. The first map contains 25 SCs (m1 ) 25 SCs, Figure 1a), whereas the second was obtained by superimposing 75 more SCs on the former (m2 ) 100 SCs, Figure 1b). Figure 2a and b reports the respective 2D-EACVF plots. By looking at this Figure, some observations can be immediately be made: they look like a Mexican hat in the center, with a more-or-less flat region around it. The reason the largest 2D-EACVF values are at the origin can be understood by referring to eqs 21 and 24 and to the correlations existing among the f values within a SC spot area (see eq 18). The region around the 2D-EACVF maximum is expected to be highly significant and sensitive to the main parameters of the separation, such as m, and the spot size (σx and σy) (see eq 24). The most important characteristic of this region is that all the information it bears is related to short interdistance correlations. This justifies the shape of 2D-EACVF near the origin, which rapidly descends from maximum to zero. In fact, for the former case (m1 ) 25), the maximum is C(0, 0) ) 0.035, whereas for the latter (m2 ) 100), C(0, 0) ) 0.14. Note that these values are just in the ratio of 4, equal to the ratio of the number of components. This finding comes from the fact that C(0, 0) is proportional to VT2/m (see eq 24) and, in this example, the SCs were chosen to have constant abundance. Under these conditions, C(0, 0) becomes proportional to m. Figure 3 reports the 2D-TACVF plot, calculated according to eq 24. One can see that this theoretical plot explains only the central part of the Mexican hat feature seen in the 2D-EACVF plots in Figure 2a and b. On the contrary, the more-or-less wavy regions around the central cone of 2D-EACVF are only an effect of the limited number of spots over which the EACVF is computed. This effect was already observed in 1D-ACVF computation.6-12 The consequence of these features is that we can, in principle, extract the separation parameters by limiting the fitting of 2D-TACVF to 2D-EACVF to the central region of (4σx, (4σy. This then makes it possible to avoid the above-mentioned random effect of the wavy region. Moreover, as explained in the Theory Section, the quantity (σh2/ah2) necessary to compute eq 24 is not accessible but must be approximated, for example, according to eq 31 or 32. Consequently, the entire procedure of parameter estimation by nonlinear fitting requires validation. As done in the past for the 1D-ACVF method,8-11 the 2D version of it was carefully explored by simulation with respect to the number of components, spot capacity, saturation factor, etc. Numerical Evaluation of Separation Parameters in 2D Random 2D Separations. Tables 1 and II report exploited 2D simulated separation sets and the result of parameter determination, respectively. Several types of 2D separations with different attributes were simulated (see Table 1). The number of SCs (m ) 250, 750, and 1500) and the SC spot size values were chosen close to the experimental values commonly observed in 2D-PAGE analysis of real samples of protein mixtures. The two simulated 2D-bed dimensions (X and Y) refer, for example, to standard 180 × 200 mm and midi 110 mm × 200 mm gels, but they can likewise be referred to GC/GC complex separations.24 Simulations account for both circular (σx ) σy ) 0.75) and elliptical spot shape (σx ) 1 and σy ) 0.75). Three types of SC abundance distributions were (24) Dallu ¨ ge, J.; Beens, J.; Brinkman, U. A. Th. J. Chromatogr. 2003, 1000, 69108.

Figure 1. Two simulated disordered maps having different numbers of SCs: (a) SC ) 25, (b) SC ) 100.

Figure 2. 2D-EACVFs calculated from the random 2D separations: (a) 2D-EACVF of Figure 1a map with 25 SCs; 2D-EACVF of Figure 1b map with 100 SCs map.

considered, constant, uniform, and exponential, corresponding to σh2/ah2 ) 0.0, 0.3h, and 1.0, respectively (see Table 1). Three nonlinear least squares estimation methods were performed. Method I corresponds to the case in which σh2/ah2 is assumed to be known. Methods II and III correspond to different estimates for σh2/ah2, by using σm2/am2 or σV2/aV2, respectively, as discussed under Computation. The results presented in Table 2 correspond to 50 repeated simulations in the case of methods I and II and to

10 in the case of method III. The standard deviations of the estimated parameter values are reported. Method I, which makes use of the theoretical abundance dispersion ratio, provides a satisfactory parameter estimation in all of the considered cases, for both σx,σy, and m. As can be seen in Table 2, only in two cases (sets 7 and 13) did the mest range not cover the true value. This failure is, however, removed if a 95% probability range is considered by using as multiplying factor Analytical Chemistry, Vol. 76, No. 11, June 1, 2004

3061

Figure 3. Characteristic shape of 2D-TACVF (eq 24) close to the origin. The main parameters describing the model, the maximum C(0, 0) and the half width at half-height along separation axis as function of σx and σy are reported. Table 1. Values of Separation Parameters Employed as Attribute of Simulate 2D-Separation separation parameter set

m

σ

σy

X

Y

R

σh2/ah2

HDFa

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

250 250 250 250 250 250 750 750 750 750 750 750 1500 1500 1500

0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 1.00 1.00 1.00 0.75 0.75 0.75

0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75

180 180 180 110 110 110 180 180 180 180 180 180 180 180 180

200 200 200 200 200 200 200 200 200 200 200 200 200 200 200

0.049 0.049 0.049 0.080 0.080 0.080 0.147 0.147 0.147 0.196 0.196 0.196 0.295 0.295 0.295

0.000 0.333 1.000 0.000 0.333 1.000 0.000 0.333 1.000 0.000 0.333 1.000 0.000 0.333 1.000

C U E C U E C U E C U E C U E

a

Height distribution function.

the Student t value of t48,0.05 ) 2.0106. Method II always gives satisfactory SC spot dimension estimates (cf. σx,est vs σx and σy,est vs σy in Tables 1 and 2). In the case of mest, very good agreement is found for R2D values lower than 0.147 (see Table 1 and II). For R2D values greater than 0.147, mest is lower than the true value, and the discrepancy is more significant in the cases of U and E abundance distributions. The origin of this failure of method II can be ascribed to a bias in estimation of the SC abundance dispersion ratio (σh2/ah2), approximated by the maximum spot dispersion ratio (σm2/am2), as discussed in the Theory Section. The bias for such cases is negative, because the estimated values are lower than the theoretical ones. Equation 34 gives a correct explanation for this bias. For example, the observed bias on m is 8, 6, and 11% for sets 12, 14, and 15, respectively, for estimates of 6, 4, and 10%, respectively, obtained from eq 34. This finding is satisfactory if one considers that m estimates are affected by their own errors, as discussed above. When the C abundance distribution is considered, the bias concerning estimation of the spot height dispersion ratio is positive. Thus, the number of SCs is overestimated. In this instance, it seems that the error in estimating m partially compensates for the σm2/am2 bias error, and thus, the final mest value result appears to be more accurate than it should be. 3062

Analytical Chemistry, Vol. 76, No. 11, June 1, 2004

Method III is based on Melanie software, which detects the spots and evaluates their volumes, and consequently, σV2/aV2. In this case, the abundance dispersion ratio is based on the approximation given by eq 32. The data sets exploited under method III (see Table 2) refer only to E and U distributions of SC abundances, with 10 replicates instead of 50, owing to the complexity of this type of simulation. One can see that only in the case of E-distributed abundances of SCs (see cases nos. 3, 6, 9, 12, and 15 in Table 2) does Melanie give a more correct estimation of the abundance dispersion ratio than that of method II and, consequently, a more accurate estimation of m. In the case of U-distributed SC abundances, the abundance dispersion ratios, evaluated by σV2/aV2, give a less accurate estimate of σh2/ah2 than the σm2/am2 quantities of method II (see Table 2) and, thus, a greater error on m (see sets 2, 5, 8, 11, and 14, method III, Table 2). Again, these findings are in agreement with eq 34. Why Melanie estimation of the SC abundance dispersion ratio, σh2/ah2, is worse for U distribution of the SC abundances and not for E distribution requires explanation. These features can be ascribed to the SC spot overlapping process and to the different ways in which methods II and III detect the spots. In fact, the SC spot overlapping process creates multicomponent spots, with an abundance distribution that differs from that of the SC spots. In the previous work on the quantitative theory of peak overlapping,17 it was pointed out that no matter what the inherent SC spot abundance distribution (e.g., U, C, or E), the multicomponent peak abundance distribution converges to the E distribution when R is increased, that is, either by increasing m or decreasing separation efficiency (see eq 20). The driving force behind this process lies in the entropic nature of the overlapping process 17 and is, thus, unavoidable. Consequently, a mixture having E-distributed SC abundances will have σh2/ah2 ) 1, and this value will remain moreor-less constant, even for the distribution of the multicomponent peak abundances. Thus, the estimate of (σh2/ah2) by (σv2/av2) result will be unbiased. On the contrary, because of noninfinite separation efficiency, any other type of SC abundance distribution (e.g., U or C) will produce E-distributed multicomponent peaks, different from the SC peaks. For example, a mixture having a U-type SC abundance distribution will have (σh2/ah2) ) 0.333, but will give rise to a peak distribution converging on the E distribution, where the squared maximum dispersion ratios, that is, σh2/ah2, converge to 1. The same must hold true even in the 2D case. The fact that Melanie finds σV2/aV2 values near 1 for E-distributed SC abundances (set cases 3, 6, 9, 12, 15 in Table 2) and greater than 0.333 (set cases 2, 5, 8, 11, 14 in Table 2) most likely proves that Melanie provides a correct estimate of the multicomponent spot abundances. The corresponding abundance dispersion ratios evaluated by method II on the basis of spot maximums do not exhibit this feature. This point is apparently most significant, even if its full exploitation lies beyond the aim of the present paper. We would only point to the need to better exploit the SC abundance distributions of complex mixtures, such as proteins. It is worth emphasizing the fact that method II, based on 2D-TACVF, eq 24, gives good parameter estimates in 2D separation of a multicomponent mixture with random 2D retention positions. Note that the cases exploited here correspond to 2DPAGE separation conditions commonly employed in proteomics, and in this instance, the maximum bias of m is 11%.

Table 2. Estimation Results of the Number of SC (mest) and Spot Shape (σx,est, σy,est), as Compared through Three Different Methodsa method I set

HDFb

mest

σx,est

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

C U E C U E C U E C U E C U E

248 ( 8 245 ( 8 247 ( 8 244 ( 10 246 ( 12 250 ( 12 724 ( 22 737 ( 23 739 ( 23 727 ( 24 730 ( 31 735 ( 27 1432 ( 45 1456 ( 44 1477 ( 44

0.75 ( 0.01 0.76 ( 0.01 0.75 ( 0.01 0.75 ( 0.02 0.75 ( 0.02 0.75 ( 0.02 0.76 ( 0.01 0.75 ( 0.01 0.75 ( 0.01 1.01 ( 0.02 1.00 ( 0.02 1.01 ( 0.02 0.76 ( 0.01 0.76 ( 0.01 0.75 ( 0.01

a

method II σy,est

mest

0.75 ( 0.01 251 ( 8 0.75 ( 0.01 243 ( 7 0.75 ( 0.01 241 ( 6 0.76 ( 0.02 249 ( 10 0.75 ( 0.02 243 ( 10 0.75 ( 0.02 242 ( 9 0.76 ( 0.01 747 ( 21 0.75 ( 0.01 723 ( 23 0.75 ( 0.01 700 ( 20 0.76 ( 0.01 755 ( 22 0.76 ( 0.02 716 ( 24 0.75 ( 0.01 684 ( 22 0.76 ( 0.01 1509 ( 45 0.76 ( 0.01 1404 ( 40 0.76 ( 0.01 1329 ( 33

method III

σx,est

σx,est

0.75 ( 0.01 0.76 ( 0.01 0.75 ( 0.01 0.75 ( 0.02 0.75 ( 0.02 0.75 ( 0.02 0.76 ( 0.01 0.75 ( 0.01 0.75 ( 0.01 1.01 ( 0.02 1.00 ( 0.02 1.01 ( 0.02 0.76 ( 0.01 0.76 ( 0.01 0.75 ( 0.01

0.75 ( 0.01 0.75 ( 0.01 0.75 ( 0.01 0.76 ( 0.02 0.75 ( 0.02 0.75 ( 0.02 0.76 ( 0.01 0.75 ( 0.01 0.75 ( 0.01 0.76 ( 0.01 0.76 ( 0.02 0.75 ( 0.01 0.76 ( 0.01 0.76 ( 0.01 0.76 ( 0.01

σm2/am2

mest

0.06 ( 0.01 0.33 ( 0.03 259 ( 7 0.986 ( 0.112 250 ( 11 0.02 ( 0.01 0.32 ( 0.03 257 ( 10 0.925 ( 0.118 249 ( 8 0.031 ( 0.007 0.31 ( 0.02 809 ( 31 0.90 ( 0.08 759 ( 18 0.038 ( 0.006 0.30 ( 0.02 835 ( 28 0.87 ( 0.08 757 ( 37 0.054 ( 0.004 0.28 ( 0.01 1751 ( 62 0.80 ( 0.04 1478 ( 54

σx,est

σx,est

σv2/av2

0.75 ( 0.01 0.75 ( 0.02 0.76 ( 0.02 0.76 ( 0.01

0.43 ( 0.06 1.11 ( 0.14

0.75 ( 0.01 0.77 ( 0.02 0.74 ( 0.02 0.76 ( 0.02

0.43 ( 0.04 1.04 ( 0.16

0.75 ( 0.01 0.76 ( 0.01 0.50 ( 0.04 0.75 ( 0.01 0.752 ( 0.004 1.04 ( 0.09 1.00 ( 0.02 0.76 ( 0.01 1.01 ( 0.02 0.75 ( 0.01

0.55 ( 0.05 1.08 ( 0.13

0.75 ( 0.01 0.76 ( 0.02 0.75 ( 0.01 0.76 ( 0.01

0.59 ( 0.07 1.03 ( 0.07

See main text. b HDF, height distribution function.

Structured 2D Separations and Their Decoding by 2DEACVF. It is well-known that the great advantage of introducing a second dimension is that separation power is increased and what in 1D produces an apparently random pattern, in 2D is naturally decoded.24 We can expect that the ability to identify a structure in a 2D pattern will be enhanced if we are assisted by the 2DACVF tool. How to achieve this target requires a learning step. In Figure 4, the 2D approach is compared to the 1D approach in elementary cases. Figure 4a and b reports 1D-chromatograms containing one ordered and two ordered sequences of SC peaks, respectively. Note that the two overlapped sequences of Figure 4b have the same frequency, equal to that of Figure 4a, but with different phases. The corresponding 1D-EACVFs are reported in Figure 4a′ and b′, respectively. One can see that the 1D-EACVF follows the 1D-TACVF of eq 13, which foresees a decreasing trend in 2D-EACVF peaks when k is increased, that is, the span between the nth positions in the SC peak sequence (see eq 1). In particular, the frequency value b () 0.1) is recovered from the 1D-EACVF plot. When two sequences are present in the 1D chromatogram (Figure 4b), the corresponding 1D-EACVF plot is more complex: one can identify the main peaks at positions ∆X ) 0.1, 0.2, 0.3, etc., corresponding to those in Figure 4a′ which come from the common frequency value of b ) 0.1 of the two series. Moreover, other minor peaks are present in the 1D-EACVF plot (Figure 4b′) overlapping the main structure. Figure 4c and d reports the same cases of Figure 4a and b, but in 2D. Note that, for the sake of completeness, full-plane 2D-EACVFs are mapped in Figure 4c′ and d′, whereas in Figure 4a′ and b′, half-plane 1D-EACVFs are drawn because of the previously discussed 1D-EACVF symmetry around 0. We observe first that the 2D-EACVF exhibits C2 symmetry (see Figure 4c and d), since correlations in positions (∆x, ∆y) and (∆x, -∆y) are equal to those in (-∆x, -∆y) and (-∆x, ∆y), respectively (see eq 24). Then we observe that there is a simple correspondence between 1D-EACVF and 2D-EACVF for a single sequence chromatogram (cf. Figure 4a,a′ and c,c′). In fact, if one considers only the parts of these functions in the positive X domain, the 1D case is simply the projection of the 2D case over the X axis. This property also holds true for the case of two

overlapped ordered sequences (Figure 4b,b′ and d,d′). In this 2D case, there is an apparent advantage over the 1D condition, since the added second dimension decodes the overlapped 1D-EACVF pattern into a nonoverlapped sequence of spots in the 2D-EACVF (cf. Figure 4b′ and d′). Under these conditions, recovering information about the structure of the original pattern becomes easier in 2D and more precise than in 1D. Obviously, we need a key to read the information contained in 2D-EACVF. The point is handled in detail in Appendix I of the Supporting Information, where the exact shape function of the spot trains of Figure 4d′ is derived and reported. Two useful expressions are eq 25 (referring to a single ordered spot train similar to the one reported in Figure 4c) and eq 26 (referring to the 2D-TACVF of two parallel ordered sequences similar to those in Figure 4d). Despite the complexity of these formulas, their application in recovering structure parameters from the 2D-EACVF map is straightforward. Let us consider Figure 4c′, which reports the 2D-EACF of the single ordered sequence of Figure 4c. Its 2D-TACVF is given by eq 25. One can see that inserting k ) 0 gives the expression of the central spot of 2D-EACVF, from which the number of SC spots nmax + 1 of the sequence and the SC spot dimensions σx, σy can be derived in a way similar to that used in the case of random SC separation maps. Moreover, by inserting k ) 1 and ∆x ) 0, ∆y ) 0 in eq 25, one can obtain the bx and by values (corresponding to the spot position closest to the origin). Let us now consider Figure 4d′, which reports the 2D-EACVF of a 2D separation containing two parallel spot trains (reported in Figure 4d). From the central spot train crossing the 0, 0 point in the 2D-EACVF plot of Figure 4d′, one can derive the values of nmax + 1, σx, σy, and bx ) 1.0 and by ) 0.75 in a way similar to what was done for the case in Figure 4c,c′ (single ordered sequence in the 2D separation). Moreover one can also derive the ∆ax, ∆ayvalues from the lateral spot trains in the 2D-EACVF plot in Figure 4d′, with reference to eqs 26-28. In fact, by putting k ) 0 and ∆x ) 0, ∆y ) 0 in eq 26, one obtains the ∆ax, ∆ay values corresponding to the most intense spot location (marked by an arrow in Figure 4d′). The above rules, based on 2D-TACVF of a single ordered spot train or of two ordered parallel spot trains having common Analytical Chemistry, Vol. 76, No. 11, June 1, 2004

3063

Figure 4. Elementary cases of 1D and 2D separations with their 1D-EACVFs and 2D-EACVFs of elementary cases. a-a′: 1D-chromatogram with a single ordered sequence of peaks. b-b′: 1D-chromatogram with two ordered sequences of peaks of the same frequency value (b ) 0.1) but with different phases (a1 ) 0.05, a2 ) 0.08). c-c′: 2D separation with a single ordered sequence of peaks, similar to the case of a-a′. d-d′: 2D separation with two ordered sequences of peaks, similar to the case of c-c′.

frequency and different phase values (eqs 25 and 26), furnishes a base for decoding 2D separation patterns of structures more complex than the two above-discussed cases (Figure 4c,c′ and d, d′). This is the case of Figure 5, which reports an ordered 2D separation map (Figure 5a) with its 2D-EACVF (Figure 5b). The case is a little bit different from the one in Figure 4d,d′, since bx () 4) is different from by () 2), and the number of the series is 7 (see Figure 5a) with different ax and ay values (see legend of Figure 5a). Following the rules above, one first identifies the central spot train passing through the origin (white line in Figure 5b) and on it, the spot closest to the origin, which furnishes the bx () 4) and by () 2) values. Then one identifies the second- and the third-most-intense spots lying out of the central spot train. These are located at ∆x ) 0, ∆y ) 2 and at ∆x ) 4, ∆y ) 0, respectively, which explains the vertical and the horizontal 3064 Analytical Chemistry, Vol. 76, No. 11, June 1, 2004

differences between the two series (see Figure 5a), thus corresponding to the ∆ax, ∆ay values. Obviously, the order in the 2D separation map of Figure 5a and its 2D-EACVF, corresponding to a linear combination of the above parameters, can be read in different ways. Figure 6 reports a second case of a structured 2D-separation map (part a) with its 2D-EACV (part b). Inspection of this map (Figure 6a) could give the inexperienced observer the impression of a random structure, but this is not the case. In fact, there are 10 different SC ordered sequences, all of which have the same phase value, but different random frequencies (the values are reported in the caption to Figure 6). In Figure 6b, only the positive quadrant of the 2D-EACVF is reported, where the frequency value sets (bx,j, by,j) (j ) 1, 10) should be identified by following the procedure described above. It is relatively easy to find three

Figure 5. 2D-ordered separation (a) and its 2D-EACVF (b). All the ordered sequences (dotted lines) have common frequency values (bx ) 4; by ) 2) and phases equal to 1 (1, 9), 2 (1, 7), 3 (1, 5), 4 (1, 3), 5 (1, 1), 6 (5, 1), and 7 (9, 1).

Figure 6. (a) Density plot of 10 2D sequences (see eqs 14 and 15). This structured map was simulated by assuming constant phase values (aX ) 0.05, aY ) 0.05) and by changing the frequency values (bX, bY)i to the following: 1 (0.1395, 0.2001), 2 (0.1932, 0.0542), 3 (0.1135, 0.0843), 4 (0.3568, 0.0549), 5 (0.0123, 0.2319), 6 (0.2873, 0.0102), 7 (0.0332, 0.1474), 8 (0.1512, 0.2576); 9 (0.3501, 0.1504), and 10 (0.2212, 0.2543). (b) 2D-EACVF from map in Figure 5a. +, frequency position (bX,i, bY,i; see Figure 5a) corresponding to correlation at n ) 1 (see eqs 13 and 14); 4, correlations at n ) 2; ], correlation at n ) 3; - -, correlations belonging to the same series.

Analytical Chemistry, Vol. 76, No. 11, June 1, 2004

3065

Figure 7. (a) 2D-separation pattern resulting from the superposition of a random pattern over an ordered one. (b) Its 2D-EACVF surface plot.

frequency value sets ((0.0332, 0.1474), (0.1135, 0.0843), (0.1932, 0.0542)) in the 2D-EACVF plot by identifying spot locations closest to the origin (0, 0) and by searching for their repetition working away from the origin (see the three dotted line directions in Figure 6b). The other frequency values hidden in the map in Figure 6a and corresponding to other spot positions in the 2D-ACVF map are marked by a cross only (see Figure 6b). It is true that attribution of these frequency values seems somewhat arbitrary, since their repeatability is not easily verified. This, however, comes from the low contrast of the 2D-EACVF map in Figure 6b. In Appendix II of the Supporting Information, a 3D plot (see Figure II,1b) of this 2D-EACVF is reported showing that the maximums are in reality well-identified and unambiguously detected. Likewise, we did not prove that the ordered sequences of Figure 6a have the same phase values. This can be seen in Figure II,1a in Appendix II of the Supporting Information reporting the full 2DEACVF mapping of the map in Figure 6a, where the characteristic parallel trains close to the main spot train, typical of phase difference effects (as in Figure 4d′ or in Figure 5b), are totally absent. The topic is not further handled here, since a systematic quantitative decoding of structured 2D separations is not the specific aim of the present paper. A third case, in which the 3066

Analytical Chemistry, Vol. 76, No. 11, June 1, 2004

sequences have a constant frequency value and random phase values, is discussed in Appendix II of the Supporting Information. In general, we can say that 2D-EACVF is valuable in decoding structured 2D separations. Obviously, its success depends on the particular structure. This achievement is relevant and deserves some comment. In fact, the phase and frequency values and their number which fully characterize a sample mixture are in relation to the Giddings sample dimensionality concept.13 We have demonstrated that these parameters are located in specific positions of the 2D-EACVF plots and it is, thus, possible to fully identify both their number and their values. The 2D approach can be a valuable tool in investigating the relationship between sample dimensionality and separation dimensionality, as advanced by J. C. Giddings.13 The spot located at the origin (0, 0) of the 2D-EACVF plots (Figures 5b and 6b) deserves some comment. We have demonstrated that this part of the 2D-EACVF of a structured map has the same meaning as the corresponding part of the 2D-EACVF for a random map. (In fact, if k ) 0 is entered in eq 25, the 2D-TACVF of a structured map, one obtains eq 24, the 2D-TACVF of a random map.) Consequently, this part of the 2D-EACVF plot of a structured map will give estimates of m )

nmax + 1 and σx, σy, in a manner similar to the one discussed above for the cases of random 2D separations. Likewise, if one has several sequences, determination of the number of components belonging to the different sequences is, in principle, possible by using expressions similar to eqs 25 and 26. To this point, only the two limit cases of fully ordered or totally random sequences have been considered. In certain cases, one can have mixtures containing both types of sequences (see, e.g., the case of naphtha in ref 11). 2D-ACVF makes it possible to single out an ordered structure embedded in a random one, as shown in Figure 7. The reported 2D-separation map (Figure 7a) looks random, but it is, instead, a superimposition of a random structure over an ordered structure. The ordered structure can be identified through the arrows reported over the ∆µx and ∆µy axes of Figure 7a. One can verify that there are 25 ordered spots and 25 randomly located spots as follows. The 2D-EACVF of the total 2D separation, which is reported in Figure 7b, retains both the original structured 2D-EACVF pattern (similar to that in Figure 5b) and the contribution of the random pattern. In fact, one can see that the spot at the origin is significantly greater than the neighboring ones, and their sequence does not decrease as 1/(nmax - k + 1) on going from k ) 0 (spot at the origin) to k ) 1 (first neighbor) (see eq 25). In fact, both the 25 randomly located spots (from eq 24) and the 25 ordered spots (from eq 25) contribute to the 2D-EACVF spot at the origin, and it is thus more prominent than the first neighbor, which only reflects contribution of the ordered 25 SC structure. This capability of the 2D-EACVF plot to single out an ordered structure comes from its ability to intrinsically amplify expression of a correlation, which is, instead, distributed among several SC peak positions in the 2D-separation map. This, then, proves the potential of the 2D-ACVF approach in decoding both random and ordered structured 2D separations and a mixture thereof.

since in general, ACVF methods are robust with respect to moderate variations in SC spot shape. In the case of significant drift, convenient transformations can be applied to achieve constant shape conditions over the whole map. This procedure has already been successfully applied in the 1D case.25 The 2D-ACVF approach is a valuable tool for exploiting Giddings sample dimensionality and its relationship with separation dimensionality, since in principle, it allows one to determine the number and the parameters of the sample. One weak point of 2D-EACVF in decoding 2D separations could be the long computation time (6-7 h), which is, however, not uncommon in image processing. This aspect could be improved by proper choice of computational resources and pertinent optimization of the computation algorithm and programming.

1D-EACVF

experimental autocovariance function

CONCLUSIONS Theoretical models of 2D-ACVF of random and structured 2D separations were derived, and their applicability in decoding random and structured 2D separations was validated under realistic separation conditions, close, for example, to the case of 2D-PAGE analysis of proteins relevant to proteomics. Some aspects connected to the evaluation of the correct abundance dispersion ratio can affect accuracy, but the severity of the problem was placed in focus and estimated. At the same time, the need to focus on the role played by the SC abundance distribution in multicomponent n-D separation was underlined. This point can be dealt with following the same approach employed to account for the quantitative aspects of peak overlapping in multicomponent separations.17 These aspects are, indeed, relevant in proteomics when low abundance proteins are searched. A possible drawback of the method could be linked to the hypothesis that all 2D separation SCs have the same spot form (e.g., the bivariate Gaussian case was here exploited). Instead, one often observes that the shapes are not constant, but exhibit different types of drifts according to the particular separation case.26 In principle, this does not constitute a serious limitation,

1D-TACVF

theoretical autocovariance function

2D

two dimension

2D-ACVF

two-dimensional autocovariance function

(25) Pietrogrande, M. C.; Tellini, I.; Pasti, L.; Dondi, F.; Szopa, C.; Sternberg, R.; Vidal-Madjar, C. J. Chromatogr. 2003, 1002, 179-192. (26) Synovec, R. E.; Prazen, B. J.; Johnson, K. J.; Fraga, C. G.; Bruckner, C. A. Adv. Chromatogr. 2003, 42, 1-42.

ACKNOWLEDGMENT This work has been supported by the Italian University and Scientific Research Ministry (Grants nos. 2001033797_001 and 2003039537_005), by the University of Ferrara, Italy, and by the NATO Linkage Grant PST.CLG.979081. Azzurra Tosi is gratefully acknowledged for map simulation and processing. SUPPORTING INFORMATION AVAILABLE Appendix I (derivation of theoretical model expressions) and Appendix II (additional figures, comments, and example) (13 pages) are available as Supporting Information. This material is available free of charge via the Internet at http://pubs.acs.org. GLOSSARY 1D

single dimension

1D-ACVF

monodimensional autocovariance function

2D-EACVF

experimental autocovariance function, eq 21

2D-TACVF

theoretical autocovariance function, eq 24

A

area of 2D separation space

A0

area of a single component spot

AT

total area of the multicomponent 1D-separation

ACVF

autocovariance function

ai

phase indicator of the 1D i-homologous series

aX,i

X component phase indicator of the 2D i-homologous series

aY,i

Y component phase indicator of the 2D i-homologous series

ah

mean single-component spot height

am

mean spot maximums

aV

mean spot volumes

bi

frequency indicator

bX,i

X component frequency indicator of the 2D ihomologous series

bY,i

Y- component frequency indicator of the 2D ihomologous series Analytical Chemistry, Vol. 76, No. 11, June 1, 2004

3067

bh

average interdistance between subsequent SC positions

x

retention coordinate (first dimension)

X

length of separation on first dimension

C

constant (abundance distribution)

y

retention coordinate (second dimension)

Cp,q

2D-EACVF value at space span p,q

Y

length of separation on second dimension

cp

1D-EACVF value at span p

E

exponential (abundance distribution)

E[]

expectation

f

Greek Symbols R1D

saturation factor in 1D-separation systems

function value

R2D

saturation factor in 2D-separation systems

h

single-component spot height



difference

HDF

height distribution function

∆µi(n)

k

interdistance order between SC positions in an ordered sequence

partition free energy of the nth term of the i-homologous 1D series

∆µX,i(n)

M

maximum p interdistance extension, over which 1D-EACVF is computed

partition free energy of the nth term of the i-homologous 2D series along the X separation axis

Mx

maximum p interdistance extension, over which 2D-EACVF is computed

∆µY,i(n)

My

maximum q interdistance extension, over which 2D-EACVF is computed

partition free energy of the nth term of the i-homologous 2D series along the Y separation axis

γ

separation extent

m

number of SCs

λ

Nx

number of points on first dimension used to generate the grid for 2D-EACVF calculation

interdistance frequency of an exponential distribution

σh

standard deviation of SC spot height distribution

Ny

number of points on second dimension used to generate the grid for 2D-EACVF calculation

σm

standard deviation of spot maximum distribution

σV

standard deviation of spot volumes

nc,1D

peak capacity

σx

nc,2D

spot capacity

standard deviation of SC spot shape along first dimension

nmax

maximum number of the term in a given homologous series

σy

standard deviation of SC spot shape along second dimension

pmax

number of detected multicomponent peak maximums or multicomponent spots

τ

time interval between two subsequent digitized positions

r

number of homologous series

Suffixes (if not specified)

Rs

resolution of separation

i

general index

SC

single component

est

estimated quantity

SMO

statistical model of overlapping

p

maximum span in the EACVF (first dimension)

Tx

mean interdistance between subsequent singlecomponent spots along x direction

q

maximum span in the EACVF (second dimension)

Ty

mean interdistance between subsequent singlecomponent spots along y direction

Mathematical Symbols ∀

for all

U

uniform (abundance distribution)

u

normalized peak shape function

v

normalized integration coordinate

Vi

volume of a single component spot

Received for review November 7, 2003. Accepted March 1, 2004.

VT

total volume of multicomponent 2D-separation

AC035312+

3068

Analytical Chemistry, Vol. 76, No. 11, June 1, 2004