analytical applications of monte carlo techniques - ACS Publications

galaxies, molecules, or quarks. Most of the chemist's world can be described by discrete random (stochastic) events be- tween atoms, molecules, and ph...
1 downloads 0 Views 12MB Size
32

8

9 5

ANALYTICAL APPLICATIONS OF MONTE CARLO TECHNIQUES Oscar A. Güell and James A. Holcombe Department of Chemistry University of Texas at Austin Austin, TX 78712

The physical sciences deal with discrete events. The properties of a system as a whole are determined by the average behavior of its components— galaxies, molecules, or quarks. Most of the chemist's world can be described by discrete random (stochastic) events between atoms, molecules, and photons. Sampling procedures, molecular diffusion, absorption/emission of radiation, and adsorption/desorption processes in a chromatographic column are just a few examples of everyday events that involve random processes (2), albeit with a known statistical probability that enables one to measure a related variable (e.g., intensity, gas flow). Monte Carlo techniques study chemical systems by looking at the random character inherent to many chemical and physical processes, and they often present a much more accurate picture of nature with a much simpler mathematical approach than is realized with more conventional theories. This REPORT deals with analytical applications of the theory of random processes, in particular with numerical solutions obtained by Monte Carlo 0003-2700/90/0362-529A/$02.50/0 © 1990 American Chemical Society

techniques. There is no unambiguous definition of "Monte Carlo methods" or "Monte Carlo simulations," but it is often assumed that the approach applies to any use of random numbers. This is very nearly correct, because Monte Carlo methods comprise that branch of experimental mathematics that is concerned with experiments on random numbers (2). Experimental mathematics has had a long history. The term Monte Carlo method was introduced in 1949 (3) and, as the name implies, refers to random numbers—roll of the dice, spin of the

REPORT wheel, and so forth. In a chemical context, Monte Carlo can be referred to as the study of an object of interest by matching random numbers, corresponding to a sample of its elements, with probabilities derived from its physical and chemical macroscopic properties. Table I contains a simplified classification of Monte Carlo techniques in analytical chemistry. We will attempt to recognize the different modes of Monte Carlo approaches presented in the literature and also to unify them. These examples and references have been selected (randomly in some cases) from the large number

available. They typify the many overlapping modes of Monte Carlo techniques found in the chemical literature. Monte Carlo approaches offer simple and straightforward applications and make much information available, but care must be taken to avoid correlations between sequences of pseudorandom numbers. The stochastic simulation is most useful for complex situations and probably not well suited to simple cases. In general, the techniques present poor precision but good accuracy for a number of samples obtained from different randomization seeds. The expected variance decreases in inverse proportion to the number of trials. In spite of their general applicability, Monte Carlo approaches have been criticized (relative to other numerical solutions) as having slow convergence and as being suitable only in cases where reduced accuracy is acceptable (4, 5). However, the increasing availability of faster microcomputers and supercomputer networks is providing the necessary means to neutralize these concerns and satisfy requirements for high accuracy. Computers have become an integral part of the analyst's toolbox, with options ranging from micro- and minicomputers to parallel processors and supercomputers. The latter, which are accessible by network from several National Science Foundation centers (6),

ANALYTICAL CHEMISTRY, VOL. 62, NO. 9, MAY 1, 1990 · 529 A

REPORT

Table 1.

Monte Carlo techniques in analytical chemistry Monte Carlo technique

Comments

Sampling (random choice) Real-signal emulation (noise generation)

Simple use of random numbers. Simple use of random numbers. Used also for statistical studies. Simple use of random numbers. Used also for signalshape simulation and explicit simulation. Simple use of random numbers. Used also for statistical studies and explicit simulation. Used for signal-shape simulation, explicit simulation, and optimization and experimental design. Gives the most information.

Integration

Ensemble (cluster)

Annealing (free-energy minimization)

Experiment (explicit simulation)

4.14

3.81

placing complex experiments, with cheaper and less time-consuming com­ puter simulations based on accurate models. The simulation of chemical and physical processes enables optimi­ zation of analytical methods and in­ strumentation and allows a better un­ derstanding of the mechanisms under­ lying the technique. Why should one bother with random numbers if one can use numerical solu­ tions? Are Monte Carlo techniques dif­ ficult to apply? Interestingly, the use of stochastic methods can be more accu­ rate and precise than other techniques. Many researchers who use Fourier transform algorithms or even just take an average without fear of the statisti­ cal errors underlying the approach dis­ regard the use of random determina­ tions because of their appearance of uncertainty. In general, Monte Carlo applications are computationally in­ tensive, but the computer code is sig­ nificantly simpler than that needed for other numerical approaches. (The computational systems used in this RE­ P O R T include a laboratory-based Mac­ intosh SE microcomputer and a Cray X-MP/24 supercomputer located at the University of Texas Center for High Performance Computing.) Determination of π

3.47

An example of the application of Mon­ te Carlo techniques is the determina­ tion of π by using a random target (e.g., a dart board) consisting of a circle in­ scribed in a square with unit area. The value of π is obtained choosing random points (x, y coordinates: RUx, Ru) on the board and using the ratio of points falling inside the circle (ATcircie) to those falling within the square UVsquare).

3.14

2.81

2.47

•^circle _ ^circle _ 7ΓΓ2 _ π ( 0 . 5 ) 2 Ν A r2 If) square -^square •'-'side

2.14 Ο

32

64

96

128

160

192

224

256 =— 4.0

No. of trials

Figure 1. Stochastic determination of π with a microcomputer. The mean and one standard deviation of 10 samples of 256 trials each is shown. The result is χ = 3.14 ± 0.07 (2% RSD). An arbitrary standard deviation (π ± 1.63/ΛΓ"2) is also presented as the pair of smooth symmetrical curves.

offer the highest speed, largest memo­ ry, and best precision. Currently, su­ percomputers are defined as comput­ ers with 20-1000 million floating-point operations per second (MFLOPS); de­ signs emphasizing extremely rapid pro­ cessing of vectors (arrays); exceptional­ ly large, rapidly accessible, active memory; and high cost. Although most stochastic applications can be accom­ modated by smaller computational

systems such as microcomputers (4, 7), certain simulations require the speed and architecture of a supercomputer. Because of the general iterative con­ struct of the computer code for Monte Carlo-based simulations, the arrayvectorizing ability of the supercom­ puter makes it "the machine of choice" in these cases. Several companies (e.g., Du Pont [6]) have saved millions of dollars by re­

530 A · ANALYTICAL CHEMISTRY, VOL. 62, NO. 9, MAY 1, 1990

, =4 . 0 ^

(i) (2)

square

where N circ i e represents all points with­ in the circular area Acjrcie, satisfying the inequality:

Rl +Rl Π^Γ

Ο)

where Ax = b — a. This integration technique is easier to apply than con­ ventional numerical integration meth­ ods, although it converges slowly (σ °c N~1/2). This method is not recommend­ ed for simple functions, but it is very efficient for the evaluation of multidi­ mensional integrals (7) because σ is nearly independent of the number of dimensions of the function. Many problems in chemistry and physics involve averaging over several variables (e.g., position, velocity, charge, free energy). For a function F of D dimensions, one chooses Ν random points (from N*D random numbers) and sums F over them using an appro­ priate Axi range for each random di­ mension. Equation 5 becomes: 1=

( Σ -Ff*!®' *2©, *3(»)> · · ·! J x

(

*

)

Hx,y,z)-\

(x2e-*)(y2e-y)x

k h h (z 2 Odxdydz

The accuracy of any Monte Carlo technique is drastically affected by the randomness of the random numbers employed. The generated random num­ bers (i.e., "random deviates") should present an uncorrected sequence fol­ lowing a given distribution (e.g., uniform, Gaussian, etc.). Most computers offer a system-supplied pseudorandom number generator that returns a "uniform" random number from a real sequence between 0 and 1 (rectangu­ lar distribution). Most built-in generators use the multiplicative (linear) congruential method, which is based in the recurrence: flu(i) = [e*R u (i-l)]mod,,

(8)

Applying Equation 6, the execution time is just three times that for one dimension (i.e., N*D and not ND, as would be expected with numerical inte­ gration techniques), and the computer code is very simple (about 15 lines). The results of several trials are pre­ sented in Table II. A very precise value would require many more trials, al­ though the convergence toward the true value is obvious. The method of importance sampling can be applied to improve the convergence of the algo-

(i)

where Ru(i) is the ith uniform random number; the constant ί has typical values near 75 (~104); η is generally 231 — 1 (~109) in microcomputers (38) and 248 (~1014) in supercomputers (39). This method is very fast but is not free of sequential correlation on successive calls. The randomness can be improved by using a "shuffle." An array is filled with some 256 values from Equation i and is used as a look-at table. To select a random number, a new number from Equation i is used to select an index between 1 and 256; the value in that location of the array is chosen. The value is immediately replaced by the next random deviate from Equation i (5,8). Several statistical tests can be applied to test the randomness of a generator, from simple frequency histograms to detailed analysis of the recurrence of digits in the sequence (4). Uniform random numbers distributed between any two values a and b can be obtained by: Rayb(i) = α + (6 - a)*Ru(ï)

(6)

For example, considering the func­ tion: F(z, y, z) - ( x V ^ i y V ) ( i V ) (7) one might be interested in the value of the integral: (•10 flO rlO

Random Number Generation

(iî)

"Normal" random numbers (i.e., a Gaussian distribution) can be generated using the central limit theorem over a sample of uniform random numbers (38):

β5(0 = ( | Χ ϋ · ) ) - γ

(iii)

where N~ 12 is sufficient. There are some additional methods for generating this distribution that are more efficient for microcomputers; they can be found in the literature (5,8). Random directions in 3D, useful in diffusional problems, can be obtained from sets of uniform deviates corresponding to each of the x,y,z coordinates:

*-·νβ,+#„"+"

(iv)

Nonuniform deviates following any desired distribution function over a range a