Analysis of blends of mixtures using multivariate statistics - Analytical

Dec 1, 1971 - Variations in composition of peppermint oil in relation to production areas. R. J. Clark , R. C. Menary. Economic Botany 1981 35 (1), 59...
0 downloads 0 Views 233KB Size
Analysis of Blends of Mixtures Using Multivariate Statistics S. C. Elliott, N. A. Hartmann, and S. J. Ilawkes Oregon State University, Corvallis, Ore. 97331 When complex mixtures such as essential oils, petroleum products, or other natural extracts are blended together, the original mixtures may be identified and determined by a multivariate analysis of a sufficient quantity of analytical data such as i s produced by chromatographic analyses. A mathematical method i s described which relies solely on library computer routines for all calculations of any complexity and assumes only that the original mixtures blend linearly. When applied to hypothetical blends of peppermint oils and hop oils computed from published data, it was successful in identifying and determining the origins of the original oils in blends of up to four oils.

AN ANALYST W O R K I N G with complex mixtures must frequently determine the concentration of components that are theniselves mixtures of variable composition. This is true of flavor or perfumery products blended from several essential oils, each of which is t o be identified and determined, and of pollutant oil slicks which may come from several sources. Previous approaches to this problem have usually wasted most of the information in chromatograms by focusing attention on a few dramatic differences. Such approaches fail for closely similar mixtures and leave unnecessary ambiguity even in relatively simple analyses. The problem is especially acute when the individual differences between the component mixtures are comparable to or less than the natural variation between samples of each mixture. This is true, for example. of the gas chromatograms of Sicilian and Floridan lemon oils. Such blends can be treated only by a statistical analysis of a large number of such differences. The method described below may be applied without modification to determine the degree of malfunction of a chemical process from its products or the degree of disease or of crossbreeding in a biological organism from analysis of an extract f’rom it, provided the malfunction, etc., can be assumed to be a linear function of composition. Since such “degrees” are often numerically undefinable, the linear assumption would equally often impose little limitation on the interpretation even when untrue. The problem can be solved by applying a number of statistical and mathematical routines which are available as library programs in most large computer installations but are seldom used by physical scientists. STATISTICAL AND MATHEMATICAL APPROACH

Mathematically, the problem may be stated as follows:

/

subject to

CY?

5 1 and CY, 2 0 for all i, where X is a vector of

t = l

the measurements taken on an unknown oil, CY, is the proportion of the ith pure oil, p L is a mean vector of the ith pure oil, and is the population variance-covariance matrix of the pure oils. Since we do not know the p L and , we use estimates of these as obtained from a discriminant analysis of the pure oils which makes the assumption that the pure oils follow a

$

$

1938

mutivariate normal distribution. Substituting these estimates in Equation 1, we obtain

k

subject to

CY, 1=1

I 1 and

at

2 0 for all i, whereR, is an esti-

$.

mate of p 7 and S is an estimate of For each component mixture there must be a body of analytical data, such as peak areas, for several representative samples. Thus if the problem is to identify the geographical origin or origins of an essential oil which may be a blend of two or more origins, then the peak areas from chromatograms of several oils from each origin are measured and normalized. From this collection of data we are able t o obtain the xi’s and S necessary for Equation 2. ANALYSIS OF THE UNKNOWN

A chromatogram of the sample for analysis is taken, the peak areas are measured and normalized. Now if the proportion of origin i in the sample is CY* then it may be shown that ( I ) the best estimate of the proportions of the various origins is the one giving the minimum value t o Equation 2. This expression is minimized using a nonlinear programming algorithm such as is now available at an increasing number of computer installations. The algorithm that we used was a modified sequential unconstrained minimization technique (MSUMT) devised by Powell ( 2 ) and further modified by Ryan (3). A difficulty arises from the very large number of peaks generated by a high efficiency chromatograph. While there is no theoretical difficulty in handling them all, the calculations demand more computer memory than most installations have available. Accordingly the available library routines have dimensions which limit the computation t o a smaller number of data points. However it will always be true that some peaks have no relevance to the analysis and others have equivalent relevance and may be combined. This is accomplished by a factor analysis with adjustments or a principal component analysis or both, again by library computer routines, which reduces the volume of data to the minimum number of factors or components (peaks) necessary to account for any specified proportion of the variation between origins (we specified 95 %). The results of the principal component analysis show the analyst which peak areas are important and should be used in Equation 2, and the factor analysis shows how they should be combined. The principal component analysis is thus sufficient in itself if the number of data points remaining after eliminating the irrelevant is small enough for the computer to handle although computer time may be saved by combining the data with a factor analysis. Bracken and G. P. McCormick, “Selected Applications of Non-linear Programming,” John Wiley, New York, N.Y., 1968. ( 2 ) M. J. D. Powell, A t . E w r g y Res. Esrab. Tech. Rept., T.P. 310, (1) J .

1967. ( 3 ) D. M . Ryan, Computer Centre, Australian National University, Canberra. Australia, personal communication, Sept 1970.

ANALYTICAL CHEMISTRY, VOL. 43, NO. 14, DECEMBER 1971

A factor analysis is almost always sufficient in itself but again computer time can be saved by eliminating unnecessary data with a principal component analysis. DATA USED

The method was tried using published gas chromatographic data on 14 hop oils (4) from five different species (early cluster, late cluster, brewers gold, bullion, and fuggle) and on 63 peppermint (menrha piperira) and mint (mentha arl;ensis) oils (5) from ten geographical origins (U.S. Mid-West, U S . Oregon, U S . Yakima Valley, Japan, Italy, England, Bulgaria, Brazil, Formosa, and China). As much commercial mint oil is partially dementholated before sale, the compositions of the totally dementholated oils were calculated and used instead of the raw data to prevent an artificial variation from this cause. Some other oils reported in ( 5 ) which were deterpenated, heavily rectified, os badly oxidized were excluded from the analysis. The data for blends of randomly chosen oils of various origins were calculated assuming that they blended linearly. RESULTS

Hop Oils. An “early cluster” and a “fuggle” oil were blended in proportions from 100% of “early cluster” to 10% of “early cluster” and 90% of “fuggle” in steps of 10%. In each case the two species were correctly identified with no more than 0.5 % of any blend assigned to the three incorrect species. The proportions were correct within 4 % absolute The 100% “early cluster” was so classified with no other species reported at a concentration above 0.01 %. Mint and Peppermint Oils. It is notorious that piperita oils are more difficult to classify than arvensis oils and this is confirmed by the following analysis. Two pure mint (arcensis) oils from Brazil and Japan were correctly assigned with no other origin mentioned for the Brazilian oil and only 1.3% of the Japanese oil calculated to be Formosan. Two pure peppermint (piperira) oils from Italy and England were also correctly assigned with 7 % of the Italian calculated to be Bulgarian and 6 % of the English oil calculated to be U S . Mid-Western.

(4) R. G. Buttery and L. C . Ling, J . Agr. Food Chem., 15, 2564 (1967). (5) M. D. Smith and L. Levi, ibid.,9, 238 (1961).

When the Japanese and Brazilian oils were blended in twocomponent blends with each other and with US. MidWestern, Italian, and U S . Yakima oils in proportions from 5 to 70% and in a three-component blend of US.MidWestern, 15% U.S. Yakima, and 80% Brazilian, the concentration of the two arvensis oils was consistently correct within 2 % absolute. In a four-component blend of 2 5 % each of US. Yakima, English, Japanese, and Brazilian, the error increased t o 4 % absolute. So far as we are aware this is the first successful analysis of three- and four-component blends. The piperita oils were less dramatically successful. In the above blends the Mid-Western oils were correct with 5 % absolute; the Yakima oils in proportions below 20% were calculated to be almost solely Italian and at 3 0 x were reported as half Yakima and half English; the Italian oil in proportions from 5 to 5 5 % was reported as a mixture of Italian and English varying from wholly English to wholly Italian over that range. Since Italian peppermint is transplanted English and the American oils are transplanted Italian some confusion is t o be expected. With blends of U S . Mid-Western, U S . Yakima, Italian, and English piperita oils over a wide range of proportions, the only confusion was among the U.S. oils. If these are lumped together as a single origin, then there was no error greater than 3 Z,

5z

DISCUSSION

The hop oil analyses involved 76 peaks per sample which were reduced to nine factors. With this body of data, excellent results were obtained. With the mint and peppermint oils, only 12 peaks were avaiiable but the results were still good even for 3- and 4-component blends except for the closely related piperita oils. It may reasonably be assumed that with more detailed chemical analyses the ambiguities of the piperita oils would be eliminated: it is clear that it is possible to eliminate such ambiguities since a previous method (6) which was specific for 2-component blends and assumed that only two components were present, identified and determined them precisely.

RECEIVED for review July 14, 1971. Accepted September 2, 1971. (6) N. Hartman and S . J. Hawkes, J. Chromatogr. Sci., 8, 610 (1970).

ANALYTICAL CHEMISTRY, VOL. 43, NO. 14, DECEMBER 1971

1939