ANALYTICAL CHEMISTRY, VOL. 51, NO. 8, JULY 1979
1160
0 04
c
Reaction is initiated in tube 0. After reaction has occurred for a time interval Ato, the reacting sulution is transferred to (hitherto empty) tube 1. After interval A t l the solution is transferred to tube 2, and so on. Now introduce the feature that the reaction product is fixed or immobilized in the tube in which it was produced, only the redCtdnh being transferred. After reaction is complete, the amount of product is measured in each tube, and a plot is made of product per tube against tube number. Evidently this experiment is mathematically equivalent to the homogeneous systems treated earlier in this paper. In particular, Equation 3 describes the distribution. The technique appears to be capable of effecting a partial spatial resolution based solely upon differences in chemical reaction rates, even if a common product is generated. I t should not be difficult to devise an experiment to carry out such a separation. A second phase is needed to trap the product. The entrapment could be based on a physical process, such as adsorption or extraction, or a chemical process, either covalent or noncovalent. The use of affinity phases (developed for affinity chromatography) seems possible. The arrangement could be discontinuous (in discrete tubes) or continuous (in a column); in the continuous reactor, the column contents could be extruded and distance along the column could be related to At, and r. Since the essential requirement in Equation 3 is that the term k I t , increase with time, two experimental approaches are possible. One, relied on thus far in this paper, is to keep k constant and to increase It,. Alternatively, I t , could be held constant and k could be increased with time. In the present context, this could be done by holding successive tubes at higher temperatures, or by increasing the reagent concentration.
0 0:
0 01 OOoO
2
4
6
8
10
12
I4
16
18
20
22
Figure 7. Product distribution curves for sample no. 1 of Table I11 The points are experimental. The upper curve is calculated for the mixture,
the lower curves show the contributions of the individual components of the kinetically determined individual concentrations: mean 99.2 %, standard deviation 1.5%.
DISCUSSION Strengths and Weaknesses. The principal disadvantage of this new method is that the error in I c z is always greater than the error in cz. Its greatest advantage is that the controlling function, Equation 3, generates a maximum, it has a finite area, and it is relatively localized along an axis related to time. One consequence is that it is always possible to alter the rank order of coefficients in sets of simultaneous equations; for example in Equations 18--20the orders are, for r = 3, A > B > C; for r = 10, B > C > A; for r = 18, C > B > A. In the conventional method of proportional equations (pseudo first-order reactions with reagent in excess) these inversions are impossible; the order would be A > B > C a t all times. (When the reactants are in excess it may be possible to control conditions, such as solvent polarity, so as to generate inversions in the method of proportional equations ( 4 ) . ) Another result is that for an n-component mixture it may be possible to work with fewer than n2coefficients, as in Equations 18-20; with fewer terms, solution of the equations is faster. If curve-fitting approaches are adopted, the I c Z ( r ) vs. r display should be more sensitive than the conventional c z ( t )vs. t curve. In fact, &) vs. r curve has some of the attributes of because the I an absorption spectrum, with experience it becomes possible to make very approximate estimates of mixture composition by visual inspection of the experimental curve. A Two-Phase Version. Imagine a series of tubes, numbered 0, 1, 2, 3 , . . . r, in which reaction will take place.
LITERATURE CITED (1) Connors, K. A. Anal. Chem. 1975, 4 7 , 2066. (2) Garmon, R. G.; Reilley, C. N. Anal. Chem. 1962, 3 4 , 600. (3) Connors, K. A . Anal. Chem. 1977, 4 9 , 1650. (4) Mark, H. B., Jr.; Rechnitz, G. A. "Kinetics in Analytical Chemistry"; Wiley-Interscience: New York, 1968; pp 134-136.
RECEIVED for review February 13, 1979. Accepted April 2, 1979.
Derivation Methods of Stoichiometric Empirical Formulas from Elemental Analysis Results Bruce W. Tattershall Department of Inorganic Chemistry, The University, Newcastle upon Tyne, NE 7 7RU, England
for about a hundred years ( I ) , with little change, but is now given only cursory attention in intruductory texts (21, and is usually omitted from advanced chemistry texts or textbooks of chemical analysis. It is probable that many preparative chemists quote elemental analyses i n support of formulas
The traditional method of obtaining formulas from elemental analyses is found to give illogical results in some cases. A mathematical description of the problem is developed, and new procedures, which are suitable for use in a computer program, and which tolerate greater errors in the analytical results than does the traditional one, are described.
without critical appreciation of the validity of doing so, and mainly because tradition so dictates. We contend that analyses support an empirical fvririula only if exactly that formula is derived by the best available process. Writing a computer program to perform the traditional calculation from very good analysis results, is an elementary
A method of calculation of stoichiometric forniulas from elemental analysis data has appeared in textbooks of chemistry 0003-2700/79/0351-1160$01 O O / O
C_
1979 American Chemical Society
ANALYTICAL CHEMISTRY, VOL. 51, NO. 8, JULY 1979
exercise ( 3 ) ,but we have developed a program to deal with results which contain realistic errors, with the minimum of the traditional subjective involvement of the chemist. By any method, the presence of sufficient error in the analysis results will lead to the calculation of the wrong formula for the compound. Different methods will tolerate different amounts of error before producing the wrong formula. The traditional method, which appears to have been accepted unquestioned for so long, has a much lower tolerance of error than ones which we present here. This lack of tolerance is connected with its dependence on a procedure selected for its calculational ease, rather than its production of the logically correct deduction.
METHODS OF CALCULATION Each member of a set of weight percentages of analyzed elements, W,,is divided by the corresponding atomic weight. A , , to give a set of relative elemental abundances, PI,in units of moles per 100 g. For a pure compound of stoichiometric composition, and completely error-free analytical methods, a positive non-integer multiplier, R , would exist such that RP, were integers N,, the desired numbers of atoms of each element in the "empirical" formula. In the presence of errors, one member at most of the set RP, is an integer, and N , are obtained by rounding RP,. The problem amounts to finding the best value of R. The Traditional Method. The set P, is normalized by division by its member of lowest value, to give a new set Q, in which the relative abundance of the least abundant element is the integer 1. Then an integer multiplier M will give a set MQ, in which the element of lowest value is equal to the desired N,, and the other elements approximate to integers. Because M is an integer typically in the range 1 to 20, it can be found by trial and error. In a computer program, for each value M which is tried, the products MQ, are rounded to the nearest integers N,. Then the deviations D , = MQ, - ,VIcould be due only to errors, if the correct value of M is being tried, or due to use of the wrong value of M . For the analysis to be useful, deviations due to errors must be smaller than deviations due to wrong multiplier; if the deviations are within some preset limit, it can he judged that they are due to errors, and that the correct value of M , and hence the correct formula, has been found. The size of errors expected will vary from one set of results to another, and the limit of deviations for the test should be calculated from an estimate of maximum expected error, provided by the chemist. A conceptually convenient way of expressing error is as the difference W ,- VI, where V, is the weight percentage of an element, calculated from the supposed formula. Since error often increases with the size of the value to which it pertains, a better expression is the fractional error, (W, - VI)/ W,. The denominator is W, rather than VI because W, is likely to be a better approximation to the true value of V , than is the value of VI being tested. The supplied fractional error could be multiplied by MQ, to give a limit to the allowed deviation D , of MQ, from N,. This, while computationally simple, suffers the disadvantage that D , for the least abundant element has been artificially set to zero, and the other ll,represent errors arising both in analysis for element i and in analysis for the least abundant element. These errors are mixed during the normalization of set PI to set Q,. Whereas this complication could be taken into account by the chemist in providing the maximum fractional error, it is better to avoid testing MQ,, and instead to test N , directly against W , by calculating V , . In cases of total analysis, i.e., where weight percentages are available for all elements in the compound, the formula weight is F = ZN,A,, and VI = 100 N , A , / F . More frequently in inorganic chemistry, only partial analysis is carried out, and
1161
a formula weight remains to be postulated for the unanalyzed residue, after the analyzed elements have been fitted. F is not then given by the above summation over the analyzed elements. However, an apparent value of the formula weight, F,, can be obtained from each analysis result: F, = 100 N,A,/W,, and from this set an average apparent formula weight Fa, = ( S F , ) / ncan be calculated, where n is the number of elements analyzed for. F,, can then be used instead of F in calculating V!,and hence in testing the fit. Fa, and hence I; will still contain errors originating from all members of the set N',, but these will be small compared with the errors in each W,. because of the averaging process used to obtain Fa". Doubling the values of N, by doubling M would not affect the fractional error, but if an increase in M would make D , exceed 0.5, then MQ, will round to a different value of N,, and a formula will be found which is a better fit to the experimental data. Thus, in a partial analysis, a specified limit of fractional error will always be satisfied if M is allowed t o take a high enough value. If for a given set of data, a high limit of fractional error is specified, a low value of M and a simple formula will be found; but if a sufficiently lower limit is specified, then a higher value of M and a more complicated formula will result. Rather than arbitrarily deciding upon a limit of fractional error and accepting the resulting formula, a chemist pursuing this method by computer program may interactively change the limit and produce several formulas, all of which represent valid reasoning from the experimental data, and from which he can objectively choose the most likely chemically. Use of a Non-Integer Multiplier. The essence of the traditional method is the normalization of P, t o Q,, which allows an integer multiplier M instead of a non-integer multiplier R to be sought. The consequent mixing of the error in one of the analyses into the other values of Q, has a more serious consequence than the difficulty of deciding a limit for deviations: error in analysis for the least abundant element can make MQ, for one of the most abundant elements round to the wrong value of N , , even though the analysis for element I is not seriously in error. For example, the calculated composition by weight of [C,H,),SilPHg is C, 48.09%; H, 9.0870;Si 9.3770;Hg, 33.46%. Suppose a partial analysis for the organosilicon part of the compound has given exactly the above values for C and H, but a silicon result of 10.070. The traditional method would then lead to an empirical formula ClllH&i for the analyzed elements, instead of CI2H,;Si. The two incorrect numbers of atoms in the formula correspond to the elements whose analyses were correct. The tolerance limit of the traditional method in this case lies between Si values of 9.547~and 9.557~, above which C12H&i is the first wrong formula found. If M is not constrained to be an integer. error is not artificially transferred from Q, for the least abundant element, where it will be least effective in producing the wrong formula. A value of M can be found which will optimize the fit between all values of MQ, and their nearest integers N,. This is equivalent to spreading the errors over all values of MQ,, so that error in any one analysis is less likely to make any MQ, round to the wrong value of N , . Frequently, this best-fit method will tolerate twice as much error in the analysis for one of the elements, before producing the wrong formula, as will the traditional method. In the above example, a silicon analysis of 10.35% is required before CllH25Siis found instead of the correct result, C12H2;Si. We have been unable to find an analytical method of dealing with the fitting problem, because of the difficulties caused by the process of rounding to integers. One successful method, which is practicable only by use of a computer, is to vary M in small steps and for each value of M t o calculate a
1162
ANALYTICAL CHEMISTRY, VOL. 51, NO. 8, JULY 1979
~
.
.
.
3E
C7
OB
..
10
09
1‘
‘ 2
‘3
BASIC compiler on our HP2000F computer, we found that, especially when no fit was found that would pass the fractional error test while M < 20.5, the calculation took up to 5 min, which we judged to be excessive for a tool to be routinely used by the laboratory chemist. The scanning multiplier method is very wasteful because, typically, 5000 calculations are done, where perhaps only 50 of the possible formulas are worth producing. By expressing the problem in vector algebra, we have found a method which is 5-10 times faster. Vector Expression of the Problem. The products MQ,,which derive from experiment, and the integer number of atoms N,, which are to be found, are both coordinates X I of points in the same n-dimensional space, where n is the number of elements analyzed for, because both are measures of the relative abundances of the elements in mole units. There is an infinite number of formulas in the space, since they are represented by all points for which all the coefficients X I are integers between 1 and a. Writing the position vector X for the general point with coefficients X I ,the experimental results are represented by X = M Q where M is a number which can take any value. This is the vector equation for a straight line passing through the origin. Figure 2 shows a projection of a 3-dimensional case onto 2 dimensions. In the absence of error, on going from the origin along the results line, one would come to a formula point which lay exactly on the line, with position vector = MIQ,and that would be the desired solution. Other formula points would lie further up the line at N k = k M I Q , where k is an integer. In the real case, no formula points would lie exactly on the line, but again there would be a series of points, each of which was locally the closest to the line. It is the first of these “closest points” that will satisfy the limit of fractional error test, which is required. We propose that a formula is the formula indicated as the most probable by the experimental results, of a group of formulas, if it is a t the shortest perpendicular distance to the results line. The length of a perpendicular from N to the line a t M,Q is the length of the vector MIQ - *q,Le., (Z(M,Q, - N J 2 ) ’ / * . In the scanning multiplier method, a large number of points MQ on the results line are tested, none of which lie exactly a t the feet of perpendiculars from formula points. The rounding process finds the formula point N which is nearest to the tested point MQ, and as the fit statistic ZD,’is the square of the distance in the space of N from M Q , variation of M past M1yields from the minimum in the fit statistic an approximation to the perpendicular distance of the formula point from the results line. This is the justification for the
M
11
Figure 1. Plots of fit statistics, on arbitrary linear scales, against muttiplier M , for the results C, 48.09%; H, 9.08%; Si, 10.0%. Minimum T corresponds to formula C,,H,,Si, found by the traditional method; minimum 6 corresponds to formula C,,H,,Si, found by the best-fit methods
formula and a fit statistic. If Pi is normalized to Qi as in the traditional method, the most likely value of M is still an integer, corresponding to zero error in the analysis for the least abundant element. As before, each integer is taken in turn, but M is now scanned through some range of non-integer values about the integer. The value of M giving the best fit statistic is selected, and then the limit of fractional error test is applied. If it fails, then M is scanned about the next higher integer, and so on. We have considered as possible fit statistics ZIDJ, ZDi2 or its square root, and some function of the fractional error (W,- Vi)/Wi.A plot of each of these statistics against M , for the analysis set of the previous example, is shown in Figure 1. The statistics involving D,vary continuously with M , and minima occur at values of M which give the best fit to formulas. The best of these formulas can be taken as the one corresponding to the deepest of the minima in the fit statistic. The plot against M of the RMS fractional error is a series of horizontal steps because the fractional error is constant for each formula. The length of the steps corresponds to the range of M over which each formula is found by rounding. For a highly repetitive method, the fractional error calculation is too time-consuming. The other two give the same results as each other in most cases we have examined, and as described below, we decided that ZDi‘ was the more logically valid. Because of the sharpness of the troughs in SD,’with M , we have found it necessary to use M in steps of 0.001, in order to find fairly complicated formulas correctly. To account for any possible error in analysis of the least abundant element, M must be scanned from 0.5 below each integer, to 0.499 above, giving 1000 calculations about each integer. On a fast computer, this presents no problem, but using an interpretive
m
C
12 :1
+
A
0
+
+
Walk Line
+
+
2L
25
S 18
19
20
21
22
23
26
27
28
29
30
31
H
Figure 2. Projection of the results and solutions space for C, 48.09%; H, 9.08%; Si, 10.0% onto 2 dimensions. Key: (+) solutions found invalid test; (0) valid solutions which failed the G, test and were not further considered; (X) solutions which qualified for the ZD,* test; (T) solution by the 0, found by the traditional method; (B) solution found by the best-fit methods; (S) start of walk
ANALYTICAL CHEMISTRY, VOL. 51, NO. 8, JULY 1979
use of the 2DI2statistic. Since only the perpendicular distance is required, a much more economical method would be to select the likely formula points and calculate the perpendicular distance directly, finding MI without resort to scanning. This can be done: the perpendicular distance is
1163
Calculate Z N i Qi
1
Calculate
M,
I
Yes
End walk
101 = ((ZQ,2ZNL2 - (ZN,Q,)2)/ZQ12)1'2 Here the arithmetic can be speedy, but suffers from computational error because of the differencing of two large numbers, I t is better to calculate the multiplier for the foot of the perpendicular as M , = ( Z V , Q , ) / ( ~ Q land z ) , then compute 8D12as before. Walking through the Solutions. I t remains to select a strategy for deciding which formula points to test. We considered varying each N, in turn, in order of least or greatest elemental abundance, but all such methods were uneconomical when n 2 4. The following method is much more selective. First we calculate a starting formula, by the rounding method, from the point on the results line a t which the multiplier scan would start in the scanning method. We then move to the next formula to be tested, by increasing one component N,by 1. The component to be increased is selected for each move in the walk, as follows. Each walk movement is taken parallel to one of the unit vectors of the coordinate system: movement parallel to the X h axis is along a line with the equation X = N + L o where 0 has components ( u h = 1, U[& = 0 ) . Although in 2-dimensional space such a movement, if in the right sense, will eventually pass through the results line, this is not so for higher dimensions: it will pass a point of closest approach, which is the foot on the walk line of the common perpendicular to X = MQ and X = I? L o . The distance along the walk line from the formula point fl to this point of closest approach is
Calculate Gh for h = 1 t o n
i.
Store j = h for which Gh IS greatest, excluding h corresponding to least abundant elsment Test -0.5
i
< Gh < 0.5 for all h
I Test fails for an)' h
m,
+
Gh = (QhZNIQ,- N h Z Q I 2 ) / ( C Q l 2Q h 2 ) For the current formula point, G h is calculated for h = 1 to n , and the walk is made in the X h direction for which Gh is most positive, because this is the direction which will produce the greatest decrease or the least increase in the perpendicular distance from the tried formula to the line. It is necessary to always make the increment positive, and walk according to the most positive G h , rather than the biggest absolute value, otherwise the walk goes into a cyclic path and may miss the best formulas. N, for the least abundant element is not allowed to vary in the walk: it is increased only if the best formula found has failed the limit of fractional error test, and then a new walk is started. During the walk, the 20,' fit statistic is calculated only for those formulas for which all G h satisfy -0.5 IG h < 0.5. This is not rigorously justified theoretically, but experimentally this cut-off very rarely excludes a good solution. The calculation of all D , for a formula does provide a logically justifiable selection: all D, must satisfy -0.5