some applications of punched-card methods in research problems in

fying and extraction of information, besides the normal uses for which the machines were designed. On account of the expense and efficiency of these m...
0 downloads 0 Views 4MB Size
FEBRUARY, 1947

SOME APPLICATIONS OF PUNCHED-CARD METHODS IN RESEARCH PROBLEMS IN CHEMICAL PHYSICS' GILBERT W. KING Arthur D. Little, Inc., cambridge, Massachusetts

Tmm is considerable interest in the use of punchedcard machines in research institutions, for use in classifying and extraction of information, besides the normal uses for which the machines were designed. On account of the expense and efficiency of these machines, i t is desirable for an institution investing in such an installation to have a wide field of problems in which the machines can'be used. It is the purpose of this paper to indicate various fields of physical-chemical research in which they are valuable or open up new approaches. Numerically the machines are l i i t e d in capability to addition, subtraction, and multiplication. New machines are appearing which can divide. On the other hand, they can do a vast quantity of such elementary operations, and therefore any mathematical operation consisting of a finite number of such steps can be done on machines. Perhaps the most fmitful feature in research will be the sorting, comparing, selecting, and collating features of the relay parts of the machines. These possibilities really belong in the field'of symbolic logic and hence are so general as to be hard to define, but perhaps the illustrations given below will be indicative.

particular property is not so valuable in physical and chemical research as in certain problems of applied mathematics. A start has been made in computing tables of mathematical functions for very small intervals in the argument. It is to be hoped that tables of interest to physicists and chemists will be considered. An example on a somewhat smaller scale is the calculation of thermodynamical quantities, such as free energies, from empirical equations, for example, the free energy, F-a+bRTlnT+cT2+dTa+

...

(1)

..

where T is the absolute temperature and a, b, c, d, . empirical constants. Tables of RT log T, T2, Ta, 1/T have been constmcted for each degree Centigrade or Fahrenheit. The evaluation of F for all temperatures for which the constants a, b, c, d can be expected to hold can be done readily, and can be done for a large number of compounds a t the same time. There we see the first advantage in machine methods. It is as easy to evaluate equations of this type for 10,000 values of T as for 100. Thus tables requiring no interpolation are available. It is also practically as easy to evaluate these equations for all compounds of interest or likely to be of interest. In this way the REPETITIONS O f OPERATIONS The simplest property of punched-card machines is physical chemist has more information tabulated than their ability to carry out any given operation, such as lie needs. Such complete tables are of great assistance ab c or a -t e", a large number of times; a hundred in constructing phase diagrams. Normally, triple thousand repetitions is not considered unusual. This points and the like have to be estimated and several evaluations of equation (1) in the neighborhood to be Presented before the Division of Chemical Eduoation at the evaluated. This tediousness is avoided in routine 110th meeting of the American Chemical Society in Chicago, methods. September 9-13, 1946.

+

JOURNAL OF CHEMICAL EDUCATION

A problem involving a few more characteristics of punched-card methods is the evaluation of partition functions for compounds with which all the thermodynamic functions can be found simultaneously. The partition function is where the sum is over a11 the energy levels E of the molecule or system. When the E's are the harmonic vibrational and rigid rotational levels of molecules, the summations in equation (2) can he simplified analytically. There remains the evaluation of Q a t temperatures of interest. This is the same as discussed in equation (1);but there is a new feature, in findmg the exponential of -E/kT. This is a tedious job to do with printed tables. One of the great contributions of the collator is the method of evaluating the function of any argument. A master table of cards, beari?g x and j(x), is key-punched from printed tables (or possibly derived by punched-card methods). The detail cards bearing x, Tor which j ( z ) is needed, are sorted on x and collated with the master table so that each master card x is followed by all the detail cards with that value of x. Then the value j(x) is "gang punched" from the master on all the following detail cards of that value of x. In actual cases the E's, i. e., the vibrational and rotational levels, do not follow any simple law by which the summation can be carried out easily. In this event, we merely evaluate Q term by tern-the number of terms hardly influencing the labor by machine methods a t all. In this way we avoid the evaluation of complicated expressions, all forcing of data to fit formulas, and use the original data (term valnes). We thus obtain more reliable values of Q.

There E (K) is a function tabulated on punched cards for each level and for various values of the parameter of asymmetry K = (2b

- a - c)/(a - c)

Interpolation of this table is done by machine operation on a copy of the appropriate region of the table. The whole of the calculation of equation (3) is done very easily in the multiplier. The absorption coefficientof a line is given by = ge-&llT

rn

(4)

Here is the line strength, obtainable from a (punched-card) table of permitted transition probabilities. In equation (4) e-z'kT is the Boltzmann factor found from equation (3) by methods described above. The weight factor g applies only to states of certain symmetry which can be characterized by the parities of the quantum numbers J, K,,. The appropriate factor g can be applied to each level by automatic control, the relays of the machines themselves taking cognizance of the parities of the quantum numbers. The table of represents all possible transitions between the upper and lower rotational levels. The set of possible transitions is reproduced, and the energies of the upper and lower states, E' and E, respectively, transferred by controlling on appropriate quantum numbers. Then the line position is calculated very easily by u=ua+E'-E

(5)

where v, is the band center. This is in general a quantity we wish to determine in the analysis, so that some arbitrary value is used a t first. At this stage there has been prepared a set of cards, one for each line in the spectrum, giving its position and intensity. The existence of these calculations on THE STOCHASTIC APPROACH punched cards has very many advantages to the specThe reduction of manual labor brings the stochastic troscopist, aiding very much in "book-keeping" probmethod in research very much into favor. The direct lems of analysis, besides of course representing a tedious analysis of X-ray, electron diffraction, and absorption amount of calculation. In addition the way is opened spectra is impossible in most cases. The only approach for further elaboration not previously considered in is the trial-and-error procedure of assuming a structure spectrum analysis. and calculating what the observed spectrum should It is possible to sort the lines in increasing wave be. Machine methods were introduced by Panling length and list them in the order in which they appear and associates in X-ray and electron-diffraction de- in the spectrum. The cards can be sorted on relative termination of structures. They have been also used intensity to eliminate lines that would not show up to analyze absorption spectra. under the experimental conditions. These eliminate In the latter problem the method has its chief ad- many erroneous identifications which have crept into vantage in the analysis of spectra of molecules of three analyses. different moments of inertia. In this type the position One of the contributions of punched cards to spec.and intensity of lines in a spectrum do not follow any trum analysis is the possibility of making the more simple law, and only a few such spectra have been an- complicated steps relating to the calculated line posialyzed. tion and intensity to the actual observation of per cent First there is the calculation of the rotational energy transmission. I t is possible to calculate the expected levels (designated by the quantum numbers J, K , , ) transmission of light a t each wave length, to be comof the upper and lower vibrational states for the three pared with the actual reading of the experimental deassumed reciprocal moments of inertia a, b, c. tecting device-that is, in the stochastic method, the calculations are carried right through to the estimate ZE = ~ E % , K + ,(a, b, c ) = (a - c ) IG-,K+,(K)+ (a + e)J(J + 1) (3) of the actual experimental readings. No intermediate

63

FEBRUARY, 1947

step, such as the assumption of Beer's Law, is necessary. Roughly, the problem is that the detector measurrs

Here cd is the number of molecules in the light path, p the induced dipole moment; j(X) describes the slit shape and can be replaced to any desired degree of accuracy by f(A) = %A.A

(7)

A X being the largest interval in wave length over which the abBorption coefficient can be considered constant. Both cdp2 and the p,'s in general are unknown but can be obtained by successive approximations to the experimental curves. Once a spectra has been analyzed we have here a method then of predicting what the observed experimental curves will look like a t any slit shape, width, and gas concentration. Although infrared spectra are of great value in analysis and other problems of research, one grave difficulty is the compilation of all known spectra. The absorption curves have a different appearance on various spectrographs, or even on the same instrument with different slit widths and gas concentrations. The punched-card analysis of effects of slit shape and gas concentration may make an index of infrared spectra more useful than hitherto. Finally, if the amount of gas in the light path is measured, the magnitude of the induced dipole can be estimated. The spectrum of an asymmetric rotor is composed of several hundred lines, the calculation of whose position and intensity is extremely laborious. Machine methods have shown that the stochastic method is applicable in making an analysis. A further step was possible, of analyzing incompletely resolved spectra, which hitherto had not yielded any reliable molecular constants. In these cases the efficiency of research is improved. The analysis of spectra is in many respects no more scientific than the solution of a puzzle. Machine methods free cerebration to more research and leave the analysis to a succession of numerical guesses, which are handled in a routine fashion with a minimum of supervision. The stochastic approach suffers for lack of proof that the solution is the only one. Machine methods of easily trying any structure of set or constants a t all conceivable increases the probability of the uniqueness of the solution tremendously. CONSTRUCTION OF STATISTICAL SAMPLES

One step beyond the repetitive and stochastic methods promises to be of greatest use in physics and chemistry. This is the feasibility of constructing ensembles. First of all, however, we shall indicate a use of punched-card methods in pure statistics. The basic principle is that the number of possible cases can either be exactly enumerated or else a satisfactory large sample of the possible cases taken a t random. This in turn is based on the fact that many problems in

statistics involve such quantities as the number of ways of taking r things a t a time from n. When n is even a small integer the number of ways is large, and further calculation depends on devising analytical methods of integrating over such combinations. In practice one soon runs into severe strains to one's analytical powers. The straightforward enumeration or sampling is exact and simplifies the problem enormously, provided the labor is done on machines. This has been found to be a particular advantage in finding fiducial probabilities from which the significance of the calculated value of an average or a probability can be estimated. This is of recurrent interest in physical chemistry. For example, let p be an 'verage value computed from n sets of data. It is suspected that these n sets can be divided into two (or more) groups, containing r and n-r sets. The average p is found to be different in the set r from the set n-r, indicating the hypothesis that this distinction is correct. But what is the significance of this result? Since r < n, the number of samples on which the average p(r) is based has been reduced so that the error, e, in p(r) is greater. It usually may be so large that p(r) fe includes the overall average p. There is then a suspicion that distinct values found for p(r) and p(n-T) would have been found even if the hypothesis is incorrect, merely by chance variation. The problem is to find the fiducial probability that p(r), or a value more distinct from p, would have the value found, i. e., to find how many of all the possible selections of r sets from n would give a value p(r,) less than p(r). If the sum of all these probabilities is less than some predetermined value (0.1 or 0.01), called the confidence coefficient, the hypothesis can be assumed to have this degree of likelihood of being correct. Now since p(r,) has to be computed after each selection, often by a di5cult formula, this problem has not yet been solved analytically. It is extremely easy to solve by punched-card methods. For, if n is reasonably small, one can merely evaluate all possible selections, compute p(r,) for each one, draw a distribution curve, and find what are the chances of getting p(rJ less than 'the value found in the selection based on the hypothesis. If n is so large that '"C, is too big to handle easily, it is legitimate a t this point to take 100 or 1000 samples at random. Hereis an example of the use of punchedcard tables of random numbers and indicates a great extension of the possibilities of the very powerful random-number techniq'ue in statistics. The method of enumerating a statistical population can obviously be applied to statistical mechanics, in particular to problems where the analytical expressions are too difficult to evaluate. A simple example arises in the study of high polymers. In these systems the entropy plays an important role and this is, in rough approximation, determined by the number of possible configurations. The order of magnitude of statistical quantities of high-polymer systems can be calculated with the representation of polymer molecule as a series

JOURNAL OF CHEMICAL EDUCATION

64

of identical links joined by bonds with restricted or free rotation. If the segments are considered as mathematical lines, the number of possible configurations can be calculated readily. However, anyone who has handled models of high polymers realizes that the segments have bulk, and many of the configurations included in the simple theory are not possible because of the volume occupied by the segments. Attempts to allow for this in the analytical theory leads to extremely di5cult formulas. A polymer of degree of polymerization N can be mpresented by a set of N cards, each one bearing the coordinates of the ends of the segments (i. e., points of rotation). Clearly, a substantial number of such sets would constitute a statistical assembly from which many interesting properties can be computed by counting (sorting and collating) procedures. First of all, however, there is the problem of constructing a set of cards for one high polymer, allowing for the volume occupied by the molecule. The first approach is to build up a polymer by adding a segment in random orientation. This is done by adding an appropriate quantity to each of the coordinates of the last segment. For example, if 90" valence angles alone (for simplicity in discussion) are considered, the end of the chain already formed can be represented by Cartesian coordinates x, y, z of integral values. The last but one end has coordmates x-1, y, z, for example. The next segment can add in one of four ways to give as its end x, y+l, z or x, y, z f1. We have to choose between adding to y or z, and in choice of sign. This was done by the use of random numbers. Blank cards are first punched, each with two random numbers, and the parity of one set determining the sign, the parity of the other determining y, z in the above example. An interesting problem arises that at each step only two coordinates can be changed (a change in the last coordinate increased would dace the new sement either back along the previouily added segment or forward in the same direction, corresponding to a valence angle of 100"). This is when the symbolic logic properties of the relays and selectors of the punched-card machmes become extremely valuable. These conditions can be met. I n practice we wish to have the valence angle at tetrahedral angles, or 120". This makes the problem

slightly more complicatrd since tetralinear or tetrahedral coordinates have to be used, but the possible selections by means of relays can be worked out. The next step is then to allow for the volume of the molecule, i. e., a segment cannot add so that it will lie in a region already occupied. Such additions can be removed after each step by collation or comparing the x, y, z coordinates of the last segment with all others. Actually this is a wasteful procedure and better ones have been worked out. ' The construction of such a set of cards (high polymers) can be done 1000 at a time, so that one soon builds up a statistical sample of 'ipolymers." The average distributions of length, shape, volume, etc., can be obtained by sorting and counting. The variation of entropy with N can be calculated. The number of t i e s an end comes back onto a molecule can be determined, which is of importance in the theory of rates of reaction. Further advances could be made, e. g.,.calculations for a condensed phase in which different molecules cannot occupy the same space, and the resultant decrease in entropy on "condensation" calculated. Energetic considerations can be incorporated by sorting and counting segments lying parallel to each other. Polar groups can be introduced by suitable extension of the card sets. The effects of free rotation could be studied by introducing weight factors by suitably rigging the random numbers. It would seem that these techniques could be extended to the study of the statistics of other systems of interest to the physical chemist. SUMMARY

The use of punched-cards in research problems can be divided into three types: (1) large scale repetition of simple operations, such as addition, subtraction, and multi~lieation.modified if necessarv " bv " elaborate classification and selection; (2) making feasible a stochastic or trial-and-error approach to the solution of problems; and (3) the construction of a representative sample of a population for statistical analysis. These principles are illustrated by their use in preparing tables of thermodynamic functions of compounds and spectrum analysis, and the calculation of the configuration entropy of high polymers.