Ab initio computational chemistry - The Journal of Physical Chemistry

Oct 1, 1985 - Mark A. Lazaga , David T. Wickham , Deborah H. Parker , George N. Kastanas , and Bruce E. Koel. 1993,90-109. Abstract | PDF | PDF w/ Lin...
0 downloads 0 Views 2MB Size
4426

J. Phys. Chem. 1985, 89, 4426-4436

indicate barriers of ca. 4 and 6 kcal/mol for CD3 and CH2CH3 rotation, respectively. The CD3 rotation barrier appears to be normal for a methyl group rotating against what is essentially a tertiary alkyl group.l4 Very few rotation barrier for ethyl groups (14) The rotation barrier for (CH,),C+CH, is 4.7-5.2 kcal/mol; see: Low, J. P. Prog. Phys. Org. Chem. 1968,6,1-80. Pitzer, K. S. J . Chem. Phys. 1937, 5 , 413-419. Rush, J. J. J . Chem. Phys. 1967, 46, 2285-2291. For (CH3),CH+CH1 it is 3.9 kcal/mol; see: Lide, D. R.; Mann, D. E. J . Chem. Phys. 1958, 29, 914-920. Aston, J. G.; Kennedy, R. M.; Schumann, S.C. J . Am. Chem. SOC.1940, 62, 2059-2063. See also ref 15.

are known, but the EPR result with 2 is close to the ethyl barrier in CH3CH2-CH(CH3)CH2CH3which is 5.0 k ~ a l / m o l . * ~

Acknowledgment. K.U.I. and J.C.W. thank NATO for the award of a research grant without which the present work would not have been undertaken.

(15) Chen, J. A.; Petrauskas, A. A. J . Chem. Phys. 1959, 30, 304-307.

FEATURE ARTICLE Ab Initio Computational Chemistry Enrico Clementi IBM Corporation, Data Systems Division, Department 48B M S 428, Kingston, New York 12401 (Received: April 16, 1985; In Final Form: August 5, 1985)

We describe briefly a global approach to simulations of complex chemical systems, and we illustrate it with a few examples. We discuss (1) liquid water simulations with up to four-body and with vibrational corrections; (2) hydration networks in a crystal; (3) water and ion structures for DNA; (4) determination of three-dimensional structure of proteins; and ( 5 ) transport of ions through membranes. A parallel supercomputer-we have assembled-appears exceptionally well adapted for these computations and is therefore described summarily. We conclude that a facility like the one provided by our parallel supercomputersas well as the global approach-quantum chemistry, statistical mechanics (Monte Carlo and molecular dynamics), and fluid dynamics-is often necessary to simulate complex systems realistically.

I. Introduction More and more we realize that important aspects of chemical research can be simulated on digital computers, and we are becoming increasingly aware that computer simulations can be derived directly from theory, without the need of empirical parameterizations. In this work we shall address the task to realistically simulate complex chemical systems from the points of view of present theories, application softwares, and also system hardwares. To start with, we define “chemical complexity” and we summarize our computational approach. Then we shall show with examples that this approach is capable of simulating nontrivial systems. Finally we shall conc1ud.e by describing, briefly, a new supercomputer, named ICAF’, loosely coupled array of processors, which appears to be notably well adapted to the needs of computational chemistry. The complexity of a chemical system is proportional to the number of mutually dependent variables either characterizing or related to that aspect of the system we wish to model. When one deals with biological systems, this definition might not be sufficient and we might include conditions whereby structural forms can evolve from previous forms at minimal energy cost. In these evolving systems, boundary conditions and fluxes need to be added to the above definition of complexity. Equilibrium considerations are often insufficient. Depending on the complexity of the system, one uses as models either quantum mechanics or statistical mechanics; for continuous systems we turn to fluid mechanics (with thermodynamics). In biological systems an amino acid, a protein in solution, and a cell are examples of systems appropriate to the above three techniques. One can safely predict that important advances in the understanding of complex chemical systems will be obtained by at0022-3654/85/2089-4426$01.50/0

tempting to connect and overlap quantum mechanics with statistical mechanics and with fluid dynamics. Here we do not speak of a “formal” overlap, that essentially is available, but mainly of an “operational” connection, obtained, in our opinion, primarily by means of computational methods and “free” access to “supercomputers”, namely computers with the highest performance. Using contemporary computer sciences language, we propose to model complex chemica! systems with an “expert system”. Let us comment on this somewhat unconventional “expert system”. Rather clearly we must reject the use of one model only (say quantum chemistry) as “the tool” to describe complex chemical systems. Indeed our expert system is based on the assumption of a general method linking different submodels, which traditionally have been considered as somewhat “independent”. These submodels constitute and define an ordered set, and the elements of the set are linked in such a way that the output of submodel (i-1) constitutes the input of-submodel (i). This is a basic rule in our expert system; the laws of quantum, statistical, and classical mechanics are other “rules” for the expert system. Briefly, we deal with an a b initio expert system. In general, a simulation of a chemical system can be staged into few successive steps.’ At first, we start by computing the energetic and the structural characteristic of its separated molecules using as input the number of electrons and the number and type of the nuclei only; ab initio quantum chemistry is used for (1) See, for example, the monographic works by: (a) Clementi, E. “Determination of Liquid Water Structure, Coordination Numbers for ions and Solvation for Biological Molecules”; (b) Clementi, E. “Computational Aspects for Large Chemical Systems”; Springer-Verlag: New York, 1980; Lecture Notes in Chemistry, Vol. 19. (c) Clementi, E. IEM J.Res. Deu. 1981, 25, 4, 315.

0 1985 American Chemical Society

Feature Article the task. Then we turn our attention to the interactions between any two molecules, say A and B, one kept rigidly at a given pasition in space, the second placed at many positions and orientations, such as to reasonably sample the interaction potential hyperspace. We recall that the electronic energy of any nonrelativistic fermion system can be partitioned into Hartree-Fock, HF, and electronic correlation energies. Depending on the required accuracy for the intermolecular interaction energy, electronic correlation effects are included or, if very small, simply estimated or even neglected. As it is well-known, a main reason for the need of electronic correlation correction to the intermolecular forces is due to neglect of dispersion effects in the HF approximation. Today a variety of techniques are available to obtain reasonable estimates of the correlation correction.’ Concerning the H F energy, we recall that HF type basis sets are computationally expensive for molecules with more than 15-20 atoms; we often use a minimal basis set, but we must correct for the large basis set superposition error.’ Note, however, that new methods to deal with the large basis set problem are being e ~ p l o r e din ; ~addition if we use pseudopotential techniques for inner shell electrons, then an HF’s quality basis set becomes feasible even for molecules with up to about 50 atoms. In the second step, we use the computed intermolecular interaction energies to construct (by fitting techniques) two-body, atom-atom interaction potentials, one atom belonging to the molecule A and the second to B. The atom-atom potentials are generally expressed in some analytical form, which can be either very simple (for example, of the 12, 6 , 1 type) or rather complex, especially if we wish to obtain a particularly accurate fit. However, when more than two molecules interact, the total intermolecular interaction energy is rather poorly approximated by pairwise potentials, and three- and higher many-body corrections should be added. Indeed, it appears that two-body potentials restrict one to qualitative rather than quantitative modeling especially for nonpure chemical systems, like for example, solutions (where effective potentials might not be sufficiently trustworthy). It is also clear that quantitative simulations require removal of the rigidity constraint; indeed intramolecular interactions are as important as many-body corrections. In the third step, which marks the passage from molecular physics to chemistry, we use the above analytical potentials as input for statistical mechanical simulations, e.g., Monte Carlo (MC) and/or molecular dynamics (MD). In this step temperature, time, entropy, and other probabilistic considerations are finally introduced. A fourth step is emerging as more and more essential especially for systems with so many bodies that a discrete modeling would be no longer feasible. The main models in the fourth step are derived from fluid dynamics: concepts as steady state, nonequilibrium, transport, vorticity, dissipative systems are introduced to describe chemical complexity. Notice that our set of steps corresponds simply to ways to solve the equation of motion under different constraints and boundary conditions, the latter being introduced to tailor different ways to represent moving objects. Let us now apply these steps to a few examples which shall help to assess how feasible is the proposed global and multistep approach to modern simulations of chemical systems. The need of a satisfactory representation of the motions in a microsystem, which naturally evolves to a satisfactory representation for a macrosystem, is also crucial (and temperature must also be accounted for). 11. From Nuclei and Electrons to Liquid Water by

Simulations It is well-known that most biological molecules interact with each other in aqueous solutions at about room temperature. It follows immediately the importance to model liquid water realistically. If we think of this liquid as of a chemical system composed by many interacting water molecules, we must address two points: (1) how many molecules are needed for a proper de(2) Boys, S. F.; Bernardi, F. Mol. Phys. 1970, 17, 553. (3) Clementi, E.; Corongiu, G. Chem. Phys. Lett. 1982, 90, 359.

The Journal of Physical Chemistry, Vol. 89, No. 21 1985 4421 ~

-278

-4.30

-8.16

-580

-042

-7.7s

-292

-6.80

-9.90

-9.45

-0.50

-9.10

-581

1

I

Figure 1. Conformations for clusters of water: the binding energy per water molecule is in kcal/mol.

scription of the liquid, and (2) the choice of the interaction potentials between water molecules. Regarding the first point, it is clear that about 500 molecules should be about sufficient to statistically describe bulk water (the “range” of the interaction potential defines a volume, and therefore, a reasonable number for the water molecules). An answer to the second point is, however, of more debate; we have proposed4 to derive intermolecular potentials from ab initio computations. A main goal is to assess the reliability of predictions for “ab initio solvents and solutions” relative to “laboratory solvents and solutions”. However, since exact potentials are as unattainable as exact wave functions, our concomitant goal was to derive an ordered set of approximations leading to an increasingly realistic and trustworthy description of the “ab initio liquid”. A common alternative route is to use experimental data to derive values for the parameters in semiempirical intermolecular pot e n t i a l ~ . ~We recall that this latter avenue is useful especially when experimental data are available in sufficient number and consistent quality. For the case of water, the experimental wealth of data is enormous.6 This notwithstanding, “experimentally derived potentials” have been subjected to criticism;’ the task of obtaining a reliable intermolecular potential from experimental data is not a trivial one even when many and accurate laboratory data are available. Indeed the ability to discriminate among the different many-body corrections (e.g., three- vs. four-body) becomes very elusive and one is forced to retreat to ”effective potentials”, intrinsically rather questionable for nonpure systems. In considering liquid water it is important to recall that the need of a statistical description stems from the existence of many, nearly degenerate, energy minima, corresponding to many and different molecular configurations. Incidentally, the existence of many minima is of no surprise, if we recall that even solids can exhibit a many minima situation.’ If we look at water clusters (H,O),, the “precursorn of liquid water, it is known that there are several minima even for small values of n; this is illustrated in Figure 1 where the early work by Kistenmacher et aL9 has been reanalyzed.1° In the figure we have explicitly shown the hydrogen (4) (a) Clementi, E. Phys. Electron. At. Collisions, Invited Pap. Prog. Rep. Int. Conf., 7th, 1971 1972, 399. (b) Clementi, E.; Popkie, H. J. Chem. Phys. 1972, 57, 1077. (5) Shuster, P.; Jacubetz, W.; Marius, W.; Rice, S . A. “Structure of Liauids”: Surinaer-Verlaa: West Berlin, 1975. 16) Sfe,’forkxample,- the five volumes: “Water: A Comprehensive Treatise ; Franks, F., Ed.; Plenum Press: New York, 1973. (7) See, for example: Faraday Discuss. Chem. SOC.1978, No. 66. (8) Marcus, P.; Jona, F. Surf. Sci. 1982, 11, 20. (9) Kistenmacher, H.; Lie, G.; Popkie, H.; Clementi, E. J . Chem. Phys. 1974, 61, 546.

4428

The Journal of Physical Chemistry, Vol. 89, No. 21, 1985

Clementi

Liquid Water: X- Ray Scattering (T=298K, N=512)

Liquid Water:goo (T=298K, N=512) -exp . ......

MCY+VIB

I --t

0.6

0.2 0.2

X

E

4.6

s(Al)

I I

-1.0

,

I

I

1

1

1

t _. I 00'

' 1 ' 30

'

I

1

6.0

9.0

1 12.0

,

,

30

60

, 90

a -1.0

,

120

Figure 2. Comparison of pair correlation functions g ( O - 0 ) from X-ray diffraction data (by Narten) and simulations.

bonds. The binding energy per water molecule is approximated by using a many-body potential described below. Here we wish to stress that if even small clusters have so many local minima, then the modeling of a sample of liquid water will most certainly necessitate a statistical description. Notice in addition how easily one can visualize structure to structure evolution either within those structures for a given value of n, or those from n to n f 1. Let us now pass to Monte Carlo and molecular dynamics simulations. Reasonable accurate two-, three-, and four-body potentials have been derived from quantum mechanical computations; the interested reader can find details e l s e ~ h e r e ; l ~these -'~ potentials are known as the MCY,lZCC,13 and DC14 and correspond to two-, three-, and four-body potentials, respectively. Let us, however, note that the inclusion (in M C or MD simulations) of the four-body corrections become feasible only if supercomputers are available; indeed even by truncating the series at the three-body level, the computational task is already heavily taxing a standard computer and a few hundred hours are required, for example, on an IBM 308 1 type processor for a simulation of, say, 7 X lo5 MD steps on 500 water molecules with inclusion of three-body corrections. Let us now compare some of our recent results obtained by using the above-mentioned potentials in three different simulations, the first considering only the two-body potential, then adding the three-body, and finally also the four-body corrections. The three simulations were carried out with the Metropolis Monte Carlo methodI5 with an (N,V,r) ensemble for T = 298 K and N = 512 water molecules at the experimental density p = 0.998 g/cm3. Cubic periodic boundary conditions with a minimum image cutoff have been ~ s e d . ' ~ , ' In ~ Jthe ~ three simulations, the total number of configurations generated (after equilibration is reached) was about lo6. We present our results in Figures 2-4. In Figure 2, we compare the pair correlation function g(O-0) obtained from our simulations with those obtained by Narten" modeling X-ray (IO) Kim, K. S.;Clementi, E., to be published. (11) Dupuis, M.; King, H . F. J . Chem. Phys. 1978,68, 3998. ( I 2) Matsuoka, 0.; Clementi, E.; Yoshimine, M. J . Chem. Phys. 1976,64, 1351. (13)Clementi, E.;Corongiu, G. I n t . J . Quanfunt Chem. Symp. 1983,10, 31. (14)Detrich, J.; Corongiu, G.; Clementi, E . Chem. Phys. Lett. 1984,112, 426.

(IS)Metropolis, N.; Rosenbluth, A. W.; Teller, A. H.; Teller, E. J . Chem. Phys. 1953,21, 1078. (16) Lie, G. C.; Clementi, E.; Yoshimine, M. J . Chem. Phys. 1976,24, 2314. (17)Narten, A. H.; Levy, H. A. J . Chem. Phys. 1971.55.2263. Narten, A. H.J . Chem. Phys. 1972,56, 5681.

,

y

,

,

4.0

I

12.0

8.0

S(q-1) 10.0

-exp

L S(A-l)

8.0

4.0

12.0

10.0

Figure 3. Comparison of X-ray structure functions H(s) from experi-

ments (by Narten) and simulations.

Liquid Water: Neutron Scattering (T=298K, N=512) p

0.6

I8

0.2

v)

c

f -E z

-O.*

3 -0.6 I -1.0

.......] ....... exp

-MCY. cc

: I-, -Ox

$ I

,

I

,

I

,sy,

I -1.0

__ .} exp

... . . .. .. ,

t

__ MCY+CC+DC

I), ,

I

,

, , , ,, S (AI)

4.0 6.0 8.0 10.0 20 40 60 8 0 100 Figure 4. Comparison of neutron structure functions H ( s ) from experiments (by Narten) and simulations. 20

and neutron beam diffraction data. In Figures 3 and 4, we compare the X-ray and neutron beam diffraction data with our simulation and we report the structure functions H(s) in function of s (s is related to the scattering angle {and to the wavelength X by s = 4 a sin {/A (see ref 16). From Figure 2, it can be seen that progression from MCY, through (MCY + CC), to (MCY CC + DC) corresponds to a trend with more structure in the g(0-0). The amplitude and the height of the first peak also increases in this progression. As a consequence, in Figure 3, we notice that for large values of s the intervals between peaks increase, yielding a phase shift. Both effects result from the approximation of a "rigid water" molecule; this limitation appears to become more and more important when a refined potential is used. For small values of s, on the other hand, the enhancement in the structure leads to more realism; this is most clearly seen by noticing the initial slope and first split peak in H(s) for X-ray scattering. While the (MCY CC) potential gives a definite improvement over MCY, the (MCY + C C + DC) potential is better still. To our knowledge, the (MCY + C C DC) potential is the first proposed potential capable of correctly reaching the amplitude of both halves of the split peak. In Figure 4,we consider the structure functions H ( s ) from the neutron scattering intensities; because a precise knowledge of the dimensions of the water molecules is necessary in deriving the

+

+

+

Feature Article intermolecular scattering from the total density, two different geometries for the water molecule have been used to reproduce the experimental data, the one obtained from gas phaseL8and the one reported by Narten,” yielding in Figure 4 two curves. Again, progressive improvements are achieved from MCY to (MCY + CC) to (MCY + CC + DC). More quantities have been obtained from these simulations. The computed heat capacity, for the three simulations, lies between 19.5and 16.0 cal/mol.deg., compared with an experimental value of 17.9cal/mol.deg. The computed x . enthalpy has a value of -6.8 kcal/mol for MCY, -7.7 kcal/mol for (MCY CC), and -8.9 kcal/mol for (MCY C C DC). The experimental value is -8.1 kcal/mol, so we can estimate that effects neglected in our potentials, like the neglect of molecular vibrations, contribute about 0.8 kcal/mol (more precisely, including up to four-body and vibrations, the computed value is -8.04 f 0.2kcal/mol, the estimated error being mainly in the two-body). The top right insets of Figures 2-4 report the data obtained with the MCY potential extended to include the effect of the intramolecular vibrations of the water molecule^.'^ The extended potential, MCYL for short, has been used in a molecular dynamic simulationIg to study some equilibrium and dynamical properties of liquid water a t room temperature. We note that there is no empirical parameter (except atomic masses) in the MCYL potential. Among the preliminary results, we note that the average intramolecular OH bond distance in the simulated liquid state is found to be 0.017 A longer than that of the isolated water molecule. Shifts in the vibrational frequencies of the water molecule in going from the vapor state to the liquid state are found to be in quantitative agreement with experiments and more accurate than those up to now reported in the literature, even with “ad hoc” semiempirical potentials. Other properties, such as radial distribution functions, heat capacity, spectral densities, quantum corrections to the total energy diffusion coefficient, dielectric relaxation time, etc., are being investigated and from preliminary results, compare well with experimental data, as long as we include both many-body and vibrations. The simulated pressure remains too high; we have, however, determined that this deficiency in the MCY potential can be removed by improving the C I two-body potential in the repulsive region. Indeed, if we use the MCY potential, as given, from large distances to equilibrium, the pressure is computed in agreement with experiments; by adding the short-range contribution the pressure sharply increases. We conclude that starting from electrons and nuclei as our particles in an a b initio quantum mechanical framework, and proceeding to ab initio statistical mechanics, we have simulated a liquid with properties notably near to those of liquid water. Note that this nontrivial achievement was made possible by combining our general simulation approach with ample computer time on our supercomputer. In the following sections we shall discuss other examples where water molecules are present either in a small number (hydration, humidity) or in large numbers (solution). Presently, we are extending our simulations to condensed systems where one needs to deal not with hundreds but with several thousand molecules; from numerical experiments on this system we wish to learn more about the meaning of the concept of temperature for example when temperature gradients are imposed. In these MD simulations we assume that the liquid is contained within a box with a temperature difference at two opposite sides. Studies of this type provide a way to analyze problems at the boundaries between statistical and fluid mechanics.

+

+

~~

(18) Benedict, W. S.; Gaibar, N.; Plyler, E. K. J . Chem. Phys. 1976, 24,

A --s

x

Proflavlne

-

+

111. Complementing Experiments As previously mentioned, one of our main objectives is to simulate complex systems realistically, using first principles. In the previous example we “created” an “ab initio liquid” which compares well to the “laboratory liquid”. In the following example we wish to demonstrate that ab initio simulations can be used most

1139. (19) Lie, G. C.; Clementi, E., work in progress.

4

’:,:I Z(A)

c

15.2

T

T :7‘ -1.7

I

1

-33

-24 75

I

I

I

1

I

-16 5 - 8 25

1

I

1

I

0

825

I



I

1

16 5

1

I

2475

33

*X(A)

bX(A)

Figure 5. Projection of 16 asymmetric unit cells with the P222 space group symmetry onto the X - 2 (middle top) and X-Y (bottom) planes. The basic unit is given at the top.

effectively to complement experimental findings even in complex chemical systems. Whereas for a system composed of small molecules with today’s computers it is possible to obtain accurate two-, three-, and four-body ab initio potentials, this is presently more difficult for large molecules. We are forced, therefore, to limit ourselves to a two-body potential, being aware, however, that important higher many-body corrections are neglected. From our experience, and the example of water confirms it, a two-body potential is sufficient to give essentially acceptable answers to structural questions, but less so for a dynamical one. In this section we summarize a M C simulationz0on a network of water molecules in a crystal of deoxydinucleoside-proflavin complex, and we compare it with a X-ray diffraction analysis.21 We recall that X-ray diffraction experiments cannot determine the hydrogen atoms in water molecules; moreover, if water molecules are mobile and disordered, then even the positions of the oxygen atoms remain undetermined. Therefore, it is of interest to compare theoretical and experimental conclusions and to propose complementary interpretations if discrepancies are encountered. From the X-ray study, the two dinucleoside phosphate strands form self-complementary duplexes with Watson-Crick hydrogen bonds in the 2:2 complex of proflavin/deoxycytidylyl-3’,5’guanosine, (Pfd(CpG)). One proflavin ( P o + cation is asymmetrically intercalated between the base pairs (C-G), and the other is stacked above them, as shown in Figure 5 , top inset. A unit cell consists of four asymmetrical units which define major and minor grooves2’ as shown in Figure 5, middle and bottom ~

~

~~~

~~

~

(20) (a) Kim, K. S.; Corongiu, G.; Clementi, E. J . Biomol. Strucr. Dyn. 1983, 1 , 263. (b) Kim, K. S.; Clementi, E. J . Comput. Chem., in press. (c) Kim, K. S.; Clementi, E. J . Am. Chem. SOC.1985, 107, 227. (21) (a) Sheih, H. S.;Berman, H. M.; Dabrow, M.; Neidle, S. Nucleic Acids Res. 1980.8, 8 5 . (b) Neidle, S.; Berman, H. M.; Sheih, H. S. Nature 1981, 288, 129.

Clementi

4430 The Journal of Physical Chemistry, Vol. 89, No. 21, 1985 NW'50 IX-RAY1

X-RAY

I

NW=61

NW-57 d

d ~~

I

NW-72

NW=66

0

0

0

Figure 7. Hydrogen-bonding patterns ( X - 2 projections) for the simulations of NW = 50, 57, 61, 66, and 72. Figure 6. Top: X-ray hydrogen bonding for 50 water molecules. Inset a gives the X - 2 projection of the oxygen atoms from the X-ray; b is the networks of the water molecules projected into the X-Y plane; c and d are X-2 and X-Y projections, respectively. Bottom: Equivalent system where 50 water molecules are free to find the most probable positions and orientations; MC simulations.

insets. From the X-ray data, 50 water molecules per half-unit cell have been reported2' in the grooves. The experimental distribution of the water molecules within the major grooves (see below) was not too convincing however, and thus we decided to check the reported hydration pattern with Monte Carlo simulations. In our M C for the water-water interactions simulation, we consider a half-unit cell as our M C primitive repeated unit; it corresponds to the portion inside the rectangular box in Figure 5. Further, in our MC simulations, the d(CpG) and (Po+species are assumed rigid at the positions given from the X-ray data. Our boundary condition is not limited to a volume that includes only a single-unit cell but extends to three sets of the 16 asymmetric unit cells of Figure 5 , stacked in three layers along the Z axis.20 The X-ray proposed pattern for the network of water molecules in the crystal is given in Figure 6 (inset a, top) projected onto the X-Y plane; hydrogen bonding was assumed, if the distance between two oxygen atoms is less than 3.5 A. Notice that the assumed hydrogen-bond pattern (see, for example, the heptagons on the left and right in inset a, top, for N W = 50 (X-ray)) is constrained by the symmetry determined for the crystal. In our M C simulation, we started by assuming that the oxygen positions of the water molecules are those given by the X-ray data; keeping such positions fixed, the corresponding hydrogen atoms were allowed to rotate freely. Figure 6 (insets b, c, and d, top) shows the simulated result, which is denoted as N W = 50 (X-ray) to distinguish it from another simulation, denoted as N W = 50, where the oxygen atoms can also change position freely (see bottom four insets in Figure 6). The corresponding X - 2 and X-Y projections of the 50 water molecules per half-unit cell are shown in insets c and d, respectively. Few of the hydrogen bonds in the X - 2 projection (inset c, top) are shown to facilitate comparison with the X-ray hydrogen bond patterns. From Figure 6 (inset b, top), it it evident that the hydrogen orientations introduce asymmetry in the water network, especially in the vicinity of the symmetry axis; see, for example, the water molecules numbered 43 and 44 or those numbered 45 and 46. For the water molecules, numbered from 1 to 50, each odd number is related by symmetry t the next highest even number, except that 49 and 50 are unique and have no symmetrically related number.

Let us compare these results with the simulation where the oxygen atom positions are no longer frozen, but can move freely within the crystal (see bottom four insets of Figure 6). We notice somewhat similar patterns. Many of the water molecules, at the bottom insets of Figure 6, are almost at the same positions as the water molecules in the corresponding top insets. If we accept the determination drawn from the X-ray data as exact and complete, then our simulation would indicate a rather limited but "possibly acceptable" agreement. The most conspicuous difference, however, is that in the NW = 50 case, seueral water molecules migrated from the X-ray positions into notably different positions, filling in the central volume of the major groove, (compare left side of insets d, top, and d, bottom). Indeed the rather sizable empty region reported at the center of the major groovez1provided us with a motivation for this study. It must be argued that, if 50 is not the correct number of water molecules, but only a fraction of it, then weakly bound water molecules originally determined in the diffraction experiment will migrate toward energetically more favorable positions, if available, yielding a pattern different from the one proposed from the X-ray data. Indeed, in the M C study, several such water molecules migrated from the X-ray position toward the vacant region in the major groove (compare the two d insets of Figure 6). Therefore, we hypothesize that there must be a number of water molecules undetected by the X-ray experiment. With this in mind, we did carry out M C simulations from N W = 50 up to N W = 82; for each simulation we studied the energetics and compared the resulting patterns with the X-ray data.20 Here we summarize the results presenting in Figure 7 the hydrogen-bond patterns for N W = 50 (X-rays data), N W = 50 (unconstrained MC simulation), and N W = 57, 61, 66, and 72. Because the full patterns are somewhat complicated, we will display only the 50 water molecules that relate to the experimental data;21the darkened circles denote averaged positions resulting from double occupancies. For additional details, see ref 20. In the simulated oxygen-oxygen pattern for N W = 66 the match of the X-ray to the computed one is impressive; the average standard deviation (AR)from the X-ray oxygen positions to the nearest simulated oxygen positions is 0.60 A; considering that the maximum and average resolutions from X-ray were 0.83 and 0.62 A, respectively, the computed AR is within the range of experimental error. Thus we conclude that there it is reasonable to assume that there are about 63 f 3 water molecules at the positions and orientations given by the M C simulationz0rather than only the 50 as detected in the X-ray study.

Feature Article

The JourMi of Physical Chemistry. Vol. 89, No. 21, 1985 4431

This kind of simulation is especially intersting bsause it shows that interpretations of X-ray data can be complemented by theurericul simulations. Indeed X-ray experiments can rather easily determine complex molecular crystal structures, difficult for computer simulations because of the need of interactinn potentials. On the other hand, simulation can supply information about the orientations of water molecules and about hydrogen bonds, and can detect mobile or disordered water molecules. W. A Somewhat More Complex Example Let us now consider another example, again dealing with water molecules hut this time with a notably more complex solute, namely, DNA in a double helix conformation. It is well-known that water plays an important role in stabilizing the structure and wnformation of nucleic acids; it is equally well-known that DNA being a polyelectrolyte, its stability in solution requires the presence of counterions. Both effects have been well appreciated since early theoretical studies on DNA.Z2-M However, the complexity of the system and limitations of the theoretical techniques and computational facilities have prevented realistic and detailed modeling at the molecular level. Experimentally, a huge amount of indirect data is available to support the DNA’s double helix models. We recall, however, that X-ray singlecrystal structures of DNA double helix and the determination of a few hydrating water molecules have been obtained only To our knowledge, no experimental diffraction pattern of a DNA polymer in solution has been re@ with the resolution required to discriminate the position of the counterions and the orientation of the water molecules, or the uccurate value of hasic geometrical parameters of the double helix relating to the sugar conformation and to the relative coplanarity of the bases. In this section, we briefly summarize some of our simulations for a B-DNA fragment with 30 base pairs (three full turns) surrounded by up to 1500 water molecules and as many counterions as needed to neutralize the phosphate groups. Three different monocharged counterions have been considered, Li*, Na+, and K+. The DNA fragment was assumed rigid;” in molecular dynamics simulations now in progress, this restriction is being lifted. The ion-water, the ion-ion, the water-DNA, and the ion-DNA interaction potentials are all obtained by ah initio quantum mechanical studies.] We stress that the a m r a c y of these potentials is only qualitative or semiquantitative; namely, we are far from the accurate simulation previously reported for liquid water. Many M C simulations, carried out at a simulated temperature of 300 K, have been performed for a large range of relative humidity, starting from one water molecule per nucleotide unit and increasing progressively the number of water molecules till a value of 25 molecules per nucleotide unit (or 1500 in total). At each relative humidity, a full Monte Carlo simulation has been performed, leading to prediction on the most probable position and orientation of the water molecules and the counterionsZ6for the B and Z conformations. (22) Pa114 M.;Hartman. K.A,; Lord R.C. 1.Am. Chcm. Soe. 196284. 3843: 1963.85.387; 1963,85,39l. (23,) For B review, see, for example, the monographic works by: (a) Manning,G. S. Q. Rm. Biophys. 1978,2,179.Rcrmd, Jr., M.T.; Anderson, C. F.; Lohman. T.M.Q. Reo. Biophys 1978,2,103. (24)Scc the r&ew by: Tcncr, J. Pro.& Biophys. Md.B i d 1978.33.83 and rcfercnces therein given. (25)Quigley, G.J.; Wan&A. H. J.; U&dto, 0.; Van der Marel, 0.; van Boom. J. H.: Rich, A. Prm. Nd.Acod. Sei.. U S A . 1981,77,7104.Rich, A,; Quiglcy. G.J.: Wang. A. H. J. ‘Bimolecular Slnwdynamics”: VoI. 1. pp. 35-52, Sama. R. H., Ed.: Adenine Pres: New York. 1981;VoI. I,pp 35-52. Diekmon. R.F.:k w . H. R.:Comer. B. Ibid. VoI. 1. m 1-34. Klun. A,;Jack A,; Yis&nitr;, M.A.;Keinard. 0.; ShaWrcd, 2.:Stcitz, T.A. 7: Mol. B i d 1979. 131. 669. (26)Clcmcnti, E.;Corongiu. 0. G a z . Chim. ReI. 1979,109,201.COG,;Clcmcnti, E. Garr. Chim. Ita/. 1978, 108, 273. Clemcnti, E.; Carangiu. G. Biopolymers 1981,20,551; 1981,20,2427; 3983.21.763: Int. 1.Quantum Chem. 1981.22.595; 1.B i d . Phys. 1983,/ I , 33;Clementi, E. In ‘Structure and Dynamics: Nucleic Acids and Proteins”; Clemcnti, E.. Sam,R.,Ed$.;Adenine Press: New Yak 1983;pp 321-364;Clementi, E.; Corongiu, G.In ‘Biomolmlar Stermdynamics”: S a m . R.. Ed.: Adenine Pres: New York. 1981;pp 209-259.

ma”,

Figure 8. Top left: Average number of water molecules (ordinate) at different relative humidities (abscissa). Top right: Same for specific EDNA p u p and for Na+ caunterim; first hydration shell (FH)might contain water molecules not strongly bound (b). Bottom: Patterns of hydration in the major (left) and the minor (right) graoves.

From the MC simulations we obtain predictions relevant to both high and low relative humidities and to DNA’s first hydration layer in a solution; our representation does not allow, however, to verify aspects where bulk water needs to he considered. As an alternative to MC simulations, one could select to work within simpler representations traditionally used to study polyelectrolytes in solution, as, for example, shown in Record‘s work or in Manning’s early papers on condensation theory (see references given in ref 23). This alternative, however, is not fully satisfactory; indeed when these simpler models are used, no distinction between different DNA’s conformers is possible. As known in the condensation hypotheses, DNA is modelled as if it would he simply a uniformly charged cylinder with parameterized radius. In addition, from the condensation theory, one cannot learn details about hydration patterns, either local or collective, since water molecules do not appear explicitly in the model. Indeed the assumption is made of the existence of a medium, crudely modeling the water molecules, which retains the same physicochemical properties (e.g.. dielectric constant, density, etc.) from the van der Waals distances at DNA to an infinite distance from DNA, within this medium charged spheres with parameterized radius simulate the wunterions. Equally unsatisfactory would he to start with the three-dimensional model of DNA, assume the same charge value for each atom, and then model the electrostatic interaction with the ions, neglecting to explicitly model the water molecules. The use of ab initio simulations allows us to obtain much detailed information. We have elsewhere discussed, extensively, on the hydration sites of DNA and the Iml and collective aspects of the hydration, and pointed out that, by increasing, progressively, the number of water molecules, a reorganization of the entire water system is o b s e ~ e d .In~ Figure 8, some of the details are recalled: in the inset at the top left we show how many water molecules (ordinate axis) are either in the grooves, or in the first hydration (FH) shell strongly bound to DNA and counter-ions. The abscissa reports the number of water molecules (normalized to one B-DNA turn) determined from many MC simulations at different relative humidities. The inset on the right reports the detailed partition of the hydration at different DNA sites. Our data from the

4432

The Journal of Physical Chemistry, Vol. 89, No. 21, 1985

Clementi interaction) but also decrease the ion-ion repulsion. These general considerations show a few of the fundamentals in the hydration and solvation of DNA. Adding water to DNA (from low to high relative humidity) decreases the phosphatephosphate repulsion by screening and therefore can stabilize the DNA system. At the same time counterions condense around DNA, further decreasing the phosphate-phosphate repulsion and bringing about electrostatic stabilization. In addition, water molecules solvate the counterions, thus decreasing ionic mutual repulsion, which would have destabilized the system. Further, since ionic interactions are very long range, one expects to encounterions” system very counter in the “DNA + water prominent collective effects and concerted motions affecting the counterions, the water molecules, and DNA. Finally, a variation in the phosphate-phosphate distances and/or relative orientations has, obviously, drastic effects on the above interactions. The main patterns for the counterions is shown in the top right inset of Figure 3: The Na’ are within a tubular region defined within minor groove or by the major groove. The distribution of the number of ions within each region depends on the ion and varies from Li+ to Na’ to K+, for example. In a recent M D study on Z-DNA hydrated with about 2000 water molecules and counterions we have complemented the above static picture with data on the counterion mobility and exchange (to be published). It is experimentally well established that DNA undergoes conformational transitions when counterions are changed,z8 thus pointing out specificity in the counterion interactions. In Figure 9, bottom insets, we report the M C iso-probability contour maps for the Li’, Na’, and K’ obtained from M C simulations with B-DNA; for the sake of simplicity, only the phosphate groups are shown. The iso-probability contours are projected into the x-y plane, perpendicular to the DNA axis. The different patterns assumed by Li’ relative to Na’ or K’ are clearly visible in the figure. Notice also the “broad” areas for the probability distributions suggesting high ionic mobility. We refer elsewherez6 for a full analysis of this point and for comparison with the Z-DNA conformer. It follows that the modeling of the interaction of a molecule Y intercalating DNA (see Figure 9) must consider DNA and counterions and the solvent; this is even more essential in studies considering molecular recognition of the genetic code.26 Before concluding this section, we must note that the computer time needed to perform the many M C experiments on the hydration of B- and Z-DNA has been very extensive. We started this work on a relatively small computer (UNIVAC 1100) about I O years ago. The progress was very slow, but it did show eventually that ab initio simulations can shed light in a unique way helping our understanding of very complex chemical systems. By now-with supercomputersstatistical mechanical simulations on DNA with water molecules and counterions have finally become an “easy” task and we can now go back on our initial studies on conformational rransitions.Ic

+

Figure 9. Top left: Patterns of hydration in B-DNA; top right: tubelike region of maximal probability for Na+ ions. Bottom: Probability distributions for counterions around B-DNA, projection into the x-y plane.

simulation nicely compare with infrared findings22as well with Dickerson’s recent X-ray diffraction data.27 A second aspect in our simulation of DNA deals with the overall hydration pattern and water molecules organization, namely with the collective nature of the solvent around DNA. A very nice example of collective behavior was found at the two grooves (see bottom insets left and right of Figure 8), where the water molecules are organized along two predominant patterns formed by filaments of water molecules, one water molecule hydrogen bonded to the next one; one pattern is nearly a roto-translational along the valley of the minor groove (see bottom right of Figure 8), and a second one is a pattern nearly parallel to the Z axis (the long axis of DNA) running across the major groove and connecting PO4groups belonging to different strands (see bottom left of Figure 8). Again, these findings seem to have been confirmed by recent X-ray experiments: if this is indeed the case, then it would add an interesting information, namely an indication of a rather extended lifetime for the connectivity pathways we have predicted (MD simulations are presently being performed to confirm the time-dependent aspect). The connnectivity pathways fully envelope DNA and its counterions as clearly visible in Figure 9, top left: we would like to propose that “the structure” of DNA double helix should include these hydrogen-bonded structures of water molecules and counterions (see below), which are as essential to a proper description of DNA as are the phosphate groups or the base pairs.26 Let us now turn our attention to the position of the counterions (see Figure 9). We recall that, whereas the water-water interaction is nearly zero at an oxygen-oxygen distance of 9-10 A, two counterions strongly repel each other at these distances. For DNA in solution, there are two kinds of ions: those in solution, free to move about and those “frozen within DNA”, namely the PO4- groups. The motion in the latter brings about conformational changes in DNA and concomitant motions in the counterions, as will a restructuring in the water connectivity pathways. Hydration cuts down the repulsive interaction; thus water molecules solvating an ion not only add stabilization to the system (the ion-water (27) Kopka, M. L.; Fratini, A. V.; Drew, H. R.; Dickerson, R. E. J . Mol. Biol. 1983, 163, 129.

V. Determination of the Three-Dimensional Structure of Proteins We now outline another “classical” example where ab initio simulations and supercomputers are expected to become “essential” in a not too distant future. As is well-known, there is a strong correlation between the three-dimensional structure and the function of a protein, but, currently, the former is not readily amenable to experiment. This naturally leads toward theoretical computations. Our approach to this problem, a collaboration with the research group of Professor H. A. Scheraga of Cornell University, is based on a build-up procedure. Briefly, we use-as interim-a set of empirical interaction potentialsz9 to compute the lowest energy conformation of a protein. We recall that ab initio interaction potentials are being assembled for some time,’ (28) Ivanov, I.; Mickenkova, L. E.; Minyat, E. E.; Frank-Karmenetskii. M. D.; Schyolkina, A. K. J . Mol. Biol. 1974, 87, 817. (29) (a) Nemethy, G.; Pottle, M. S.; Scheraga, H. A. J . Phys. Chem. 1983, 87. 1883. (b) Momany, F. A,; McGuire, R. F.; Burgess, A. W.: Scheraga, H. A. J . Phys. Chem. 1975, 79, 2361.

Feature Article but presently the effort is not completed. Starting from the lowest energy conformations of fragments A and B, the lowest energy structures for the polypeptide AB are generated. The structures are computed assuming fixed bond lengths and bond angles and minimizing the energy of all possible starting conformations with respect to all dihedral angles. The possible starting conformations are derived by combining all low energy structures of fragment A with those of fragment B. The resulting low energy conformations of fragment AB are then combined with fragment CD to give fragment ABCD. Eventually, the lowest energy structure of the entire protein can be determined. By now the lowest energy conformations of all 400 possible dipeptides for the 20 naturally occurring amino acids have been determined. This data base shall soon be made available for general use; at this time, we are testing it attempting to predict the three-dimensional structure of interferon.30 The work is still at a very preliminary stage, but some of our experience is worth mentioning. Interferons are proteins produced by cells to prevent infections by viruses. The particular interferon we are studying is a human leukocyte interferon consisting of 155 amino acid residues with known s e q ~ e n c e ; ~its’ three-dimensional structure is not known. We have simulated the structure for two rather extended sections of this interferon, a 16-residue fragment in the middle of the protein, and also a 34-residue fragment on the C terminus end of the protein. The letter code for the 16-residue fragments is TIPVLHEMIQQIFNLF, and the corresponding code for the 34-residue fragment is ILAVRKY FQRITLYLKEKKYSPCAWEVVRAEIMR. The larger 34-residue fragment is especially interesting because this sequence is strongly homologous in several species of human leukocyte interferon. The lowest energy conformations for both fragments are shown in Figure 10, insets a and b. For simplicity, only the backbone atoms for the polypeptide chain are given. Both fragments are a helical in nature; however, in the larger fragment, two a helices are present, and they are folded back upon each other. These results, though only preliminary, provide encouragement since they provide an indication that the method is capable of folding protein. At this time we are continuing to build the entire 155-residue protein. However, it is well-known that the native environment of most proteins is in an aqueous solution and that the conformation of a protein can be radically affected by the solvent. The latter can be simulated by incorporating a hydration shell model by using simple steric considerations. For the 34residue fragment preliminary results indicate that, by including solvent, the two helices still exist, but that the bend between these two helices is more extended as shown in inset c of Figure 10. Very recently we have realized that numerical optimization techniques used in high energy physics can easily be adapted to determine the energy minimum in a many-variable space. The new techniques are now being tested as an alternative to the method tested above (see: Tosi, C.; Pavani, R.; Fusco, R.; Parisi, V.; Zirilli, F. Rend. Accad. Naz. Lincei, in press). In concluding we note that presently we are completing a ’library” of a b initio pairwise potential^.^^ This work, started over seven years ago,’ will make it possible to determine how sensitive are the simulated structures to a particular choice of the interaction potentials. This library is intended to complement the extended set of ab initio pairwise potential for the interaction of amino acids and water33which have been used to study solvation effects in proteins. Before concluding this section, we stress once more that, as for previous application examples, simulation of the proteins structure requires not only proper models, but also ample computer time on supercomputers. The same observation holds for molecular dynamics simulatins on proteins; in this context we recall that the (30) Levy, W. P.; Rubinstein, J. S.; Vrsino, D.; Chin, Y. L.; Moschera, J.; Brink, L.; Gerber, L.; Stein, S.;Pestka, S. PNAC 1981, 78, 6186. (31) Chin, S.; Gibson, B.; Clementi, E.; Scheraga, H., work in progress. (32) Clementi, E.; Corongiu, G.; Probst, M., to be published. (33) Clementi, E.; Cavallone, F.; Scordamaglia, R. J . Am. Chem. SOC. 1977, 99, 5531.

The Journal of Physical Chemistry, Vol. 89, No. 21, 1985 4433

b

C

Figure 10. (a) Simulated lowest energy conformation for a 16-residue fragment in human leukocyte interferon. (b) Same for a 34-residue fragment found at the N-terminus end. (c) S a m e as b with considerations of solvent effects.

dynamical nature of proteins is rapidly becoming a central feature of protein chemistry research. VI. Transport of Ions Through Membranes A final example of the need for a global approach to simulations of complex chemical systems is provided by the transport of ions through Gramicidin A, GA, transmembrane channel^.^^,^^ For the interaction between ions, GA, water molecules, and phospholipids, ab initio quantum chemical s t ~ d i e s ~yielded ~ , ~ ’ reasonable pairwise interaction potentials. These potentials are useful in obtaining a qualitative and preliminary visualization of the main interactions. As it is known from quantum chemistry, we can obtain atomic charges and, more generally, electrostatic potentials. Both can be used to assess electrostatic interactions between molecules. Clearly the latter is only a part of the total interaction, but often an important one. Ab initio potentials at the Hartree-Fock‘s level, by definition, include the electrostatic contribution, and also account for charge-transfer and exchange effects. (34) Urry, D. W. In “Membranes and Transport”; Martonosi, A. N. Ed.; Plenum Press: New York, 1982; Vol. 2, pp 285-294. (35) Eisenman, G.;Horn, R. J . Membr. B i d . 1983, 76, 197. MacKay, D. J.; Barens, P. H.; Wilson, R. K.; Hagler, A. T. Biophys. 1984, 46, 229. (36) Fornili, S. L.; Vercauteren, D. P.; Clementi, E. J . Biolmol. Srruct. Dyn.1984, 1, 1281. (37) Swaminathan, P. K.; Vercauteren, D. P.; Kim, K. S.; Clementi, E. J . B i d . Phys., submitted for publication; see in addition IBM Research Report KGN-10, October 1984.

4434

The Journal of Physical Chemistry, Vol. 89, No. 21, 1985 15 10

tlf

5

0

5 10 15

A

Clementi

,

GA

K+ 12.15 I

K+IZ=Ol

6.3

-6.3

t

0.5

I

1

I

0.15

0.05

Tlpsecl

I

I

I

I

0.05 TI pseci

0.15

I

1

0.05

0.15

Tipseci

Figure 11. Gramicidin A. Top left: the two GA's monomer. Middle left: contour diagrams for GA interacting with one water molecule or one ion. Bottom left: minimum energy path for Na+ and K+. Top right: four configurations for Na+ or K+ interacting with GA and water. Bottom right: velocity autocorrelation function for Na+ or K+ midway GA, at the end of the channel and in water solution.

The water molecules within the channel and those at the two ends behave differently, being a solvent in the latter instance and complexing species in the former case. Thus a static picture would be insufficient. Indeed, let us recall that the primary interest in GA is to understand details of the motion of the ions through GA and to learn how this motion is affected for example by the molecular vibrational modes of GA and by the perturbation from phospholipid. Even if the complexity of this problem is notable, still it is well within the range of possibilities of today's supercomputers. Let us list some preliminary results we have obtained. In Figure 11 (top left inset) we recall the structure of GA as given by U r r ~ .For ~ ~simplicity we have reported only the backbone atoms of the two helical monomeric units. Immediately below we report the iso-energy contour diagrams we have obtained36by computing the interaction energy of GA with either one water molecule or Naf ions. The heavy contour (generated by the overlap of many) defines the GA hard wall. Notice that the width of the channel (hard wall to hard wall distance) differs in the three cases (H,O, Na', and K+) due to the different electrostatic interaction of the three species. Statistically, the water molecules line up within the channel as shown in the four insets at the top right of Figure 11. The configurations reported are statistically significant and have been extracted from an MC study where we have analyzed few million configuration^.^^ The Na' (or K+) was placed rigidly either in the middle of the channel ( Z = 0 A) or at one end (2 = 15 A). By constraining the ion to remain within a plane perpendicular to the 2 axis, and by advancing this plane from Z = -1 5 to Z = 15 A, we have obtained the ideal minimum energy path as shown at the bottom left of Figure 11. Notice that the ion is spiraling down the channel, with an amplitude larger for Na' than for K+; the spiraling path is a consequence which could be anticipated by noticing that the iso-energy contours delineates a spiral rather than a cylindrical tube. These differences in the pathway and in the interaction energy are at the origin of the selectivity among monovalent ions (38) Kim, K. S.;Vercauteren, P. D.; Velti, M.; Chin, S.; Clementi, E. Biophys. J . 1985, 47, 3 2 1 .

(as known, no divalent ion goes through GA). We recall that without an external electric field gradient, no ion would go through GA, since it would remain confined about some position within the channel. In general, the ion will spend some time within a given neighborhood; at the entrapped location, the ion rattles within its cage, built either by solvent molecules (1 5 < Z < 25) or by the GA walls (see the iso-energy maps in Figure 11) and two water molecules, one in front and the second at the back of the ion (see the M C results in Figure 11, top middle insets). The frequency and amplitude of the ionic motions is clearly a function of the ion position within the channel, since the interaction (and the forces) of the ion with the system varies with z. The six insets at the bottom right of Figure 11 compare the velocity autocorrelation functions for either Na+ or K+ at either 2 = 0 or 2 = 15 8, or at 2 >> 15 A (solution) obtained from M D simulat i o n ~ . Consideration ~~ of the motions of GA and of the perturbation from the phospholipids will affect these results. Our modeling is still preliminary (we have only started to study GA and the ions transport). However, this short introduction should suffice to indicate some of the potentialities inherent in the global approach to simulation of complex channel systems, where quantum chemistry, Monte Carlo, and molecular dynamics are used at various phases of the study. For additional M D studies we refer to the extensive work by Wilson and his

VII. An Experimental Parallel Supercomputer System The advent of the so-called supercomputers is allowing large-scale calculations that were only a dream a few years ago. As clear from the examples outlined in the previous sections, our research interests are centered on problems on complex systems where large-scale calculations are required. Hence, high performance computer hardwares and softwares are crucial resources for our research. This primary motivation has lead us to experiment in assembling a parallel supersystem, called lCAP (39) Kim, K. S.; Clementi, E. IBM Research Report KGN-7, September 1984.

The Journal of Physical Chemistry, Vol. 89, No. 21, 1985 4435

Feature Article TABLE I: Comparison of Execution Times (in minutes) on ICAP Using 1-10 A P ’ ~ ~

iob integrals“ SCFb Monte Carloc molecular dynamicsd

1 AP

3APs

6APs

10APs

193.9 75.0 162.1 127.5

66.6 26.0 57.8 44.1

33.9 14.3 32.0 23.3

21.7 10.6 18.1 15.2

~~

“ 114 contracted functions; 384 primitive functions and 42 centers. bSame system (closed shell). cLiquid water simulation with up to four-body interactions. dLiquid water with MCY potential. (loosely coupled array of processors), and in developing system software needed to support our quantum chemistry and statistical mechanics computer programs. Because of our priorities, namely quicK migration of our computer application codes to parallel executions, we have chosen the path of least resistance in carrying out this project. The main characteristics of our parallel strategy are (1) parallelism based on few (less than 20) but powerful array processors, with 64-bit hardware; (2) architecture as simple as possible, but extendable; (3) system software that varies as little as possible from what is used for standard sequential programming; (4) initial application programming mainly in Fortran; (5) migration of old sequential code to parallel code with minimal modifications. It should be noted that all of our current hardware and most of our system software are standard products available off the shelf; this is an important factor in the rapid and cost-effective development of our system. Specifically, we have selected the floating point systems FPS-164 for the array processors (AP), with an IBM 4341, or IBM 4381, as the host computers. The strategies we have developed to program our system, comparative studies on its performance system in practical applications, our current experience, and further developments are discussed in detail elsewhere.40 In our system, a parallel program is constituted by a main program running on one of the two host processors, and several (up to 10) tasks running concurrently in the FPS-164. Software programs have been coded to allow synchronization and to handle communication between main program and the tasks.41 Most recently we have developed a precompiler which handles the application programs conversion from sequential codes to parallel; this “users tool” goes a long way in making our brand of parallelism a very simple one to be implemented. Presently, ICAP consists of 10 FPS-164 attached processors; seven are attached to an IBM 4381 host and the remaining three are attached to an IBM 4341 through an IBM 2914 switch connection, so they can be most easily switched between the IBM 4341 host or the IBM 4381 host. The FPS-164 processors are attached to the IBM hosts through standard IBM 3 Mbyte/s channels. A third IBM 4341, connected to a graphics station, completes the host processor pool. The three IBM systems are interconnected, channel-to-channel, via an IBM 3080 connector. This configuration has 106 Mbytes of Random memory, 30 Gbytes of disk storage and a peak of performance of 110 MFLOPS. As shown elsewhere (see ref 41) this configuration has a performance comparable to a CRAY- 1s. The acquisition of 2 FPS-I64/MAX boards for each one of our 10 AP’s upgrades ICAP from 110 MFLOPS peak performance to 550 MFLOPS. Of course, we must recall that ”peak performance” is only a limiting concept of margin usefulness in forecasting timings for realistic computer simulations. Indeed “substained performance” can be easily a fifth or even a tenth as large. In Table I, we compare examples taken from quantum mechanics, Monte Carlo, and molecular dynamics performed on 1, 3,6, 10 FPS-164 in parallel; all test cases represent full computations from initial input to final output of some complete problem. (40) Clementi, E.; Corongiu, G.; Detrich, J.; Khanmohammadbaigi, S.; Chin, S.; Domingo, L.; Laaksonen, A,; Nguyen, H. L. IBM Research Report KGN-2, May 1984. (41) Detrich, J.; Corongiu, G. IBM J . Res. Dev., 1985, 29, 422. See also IBM Research Report KGN-1, March 1984.

Figure 12. The ICAP supercomputer (see text for details).

Notice the very high degree of parallelism in all the applications we have presented. The timing we have given are obtained without any optimization of the codes; we are now engaged in this optimization task and, as expected, notable time savings can be achieved. We have many more examples than those reported, and include 3-D structure of proteins, seismic, CI, different M C applications, high energy physics (CERN) applications, graphics, etc. We welcome scientists to use our system in order to learn more on the applicability of parallelism, a new trend in computations with very clear and lasting importance. The lCAP system is schematized in Figure 12, where, however, we have replaced the IBM 4381 with an IBM 3081 and one of the IBM 4341 with an IBM 4383: these substitutions would increase performance. In addition, in Figure 12 we show a bus connecting the AP’s and five shared bulk memories forming two rings for further communication from AP to AP. The bus and the ring features are expected to notably increase the flexibility of 1CAP; we expect to have these features operating within 1985. We are of the opinion that fluid dynamical simulations (being of limited granularity and requiring heavy data transfer, AP to AP) marginally feasible on the present lCAP configuration will require the addition of these communication facilities. Thus we are getting ready for implementing the fourth step in our general approach. Recently we have started to set up a second configuration very similar to 1CAP. For sake of clarity, we have called the latter 1CAP-2 and reserved the designation of 1CAP-1 for the first experiment. 1CAP-2 highlights are (1) 10 FPS-264 replace the FPS-164, (2) MVS rather than VM is chosen as the operating system, and (3) we use an IBM 3081 as a front-end processor. Presently a first group of three FPS-264’s are connected to a front end IBM 3081. We recall that since the FPS-264 is 3-4 times faster than the FPS-164, we can expect an equivalent gain in performance in ICAP-2 over 1CAP-1. Even if premature, it is most intriguing to consider some form of connection between 1CAP-1 and ICAP-2. Presently our thoughts are toward connecting the above two nonhomogeneous clusters via a high speed bus and to use the newly announced IBM 3090/200 as the front end for the cluster. The truly excellent scalar performances for the IBM 3090 are a main factor in this plan. As known, the IBM 3090 supercomputer performance in scalar is only one of the features which makes this product particularly valid in floating point demanding applications. We inform the interested reader that for the last 6 months lCAP has been made available for experimentation on parallelism to many research institutions; we consider this “visiting scientist program” a most important instrument for expanding our knowledge from computational chemistry to scientific and engineering applications, in general. We also recall that a second ICAP system is being assembled at the IBM Scientific Center in Rome, Italy; this system

4436 The Journal of Physical Chemistry, Vol. 89, No. 21, 1985

Clementi

VIII. Conclusions More and more computer simulations can realistically model the complexity of chemical systems. We hope that our examples have illustrated this point. The second main point concerns the experimentation on hardware and system software to bring about new concepts and designs in supercomputers, which in turn are aimed at simulating complex systems. We would like to recall that 1CAP-1 and 1CAP-2 are among the first designs where uector features and parallelism are equally stressed. The use of an IBM 3090 to couple ICAP-1 and 1CAP-2 will result in a configuration where scalar, vector, and parallelism are finally available for truly demanding scientific applications. We stress that simulations are not restricted to quantum chemical modeling, but include both Monte Carlo and molecular dynamics as aspects of statistical mechanics. More recently, we are extending toward fluid dynamics by analyzing, for example, problems related to vorticity and formation of dissipative structures. Concerning fluid dynamics, we have started to ask questions like (1) validation of linear lows relating fluxes and forces; (2) coherent behavior in nonequilibrium steady state; (3) study of the

onset of hydrodynamic instabilities. As known, such questions have preoccupied physicists and engineers for long, but only now with supercomputers we can attempt to perform critical “numerical experiments”. We are of the opinion that our global approach, exemplified in this work and the basic structure of the parallel supercomputer we have assembled and tested, will become more and more standard features in chemistry’s research and development due to its intrinsic generality and flexibility. We are also of the opinion that, following the main trend in the development of physicochemistry, more and more chemists will experiment with computers designed in their own laboratory, as we have done for 1CAP-1 and 1CAP-2. In the introductory section, we have related our overall approach to current thinking in computer sciences, and for this reason we have made reference to “expert systems”. Whereas physics, dealing mainly with simple and idealized systems, has long become a science based on accurate models and laws, chemistry, dealing mainly with complex and real systems, is much constrained by empiricism. However, we are of the opinion that the combined use of ab initio computational chemistry and of supercomputers within the overall philosophy of the expert systems will be a main force in merging more intimately chemistry with physics. Our optimism derives much support by considering the new trend in theoretical chemistry, where both quantum chemistry and statistical mechanics are increasingly used to tackle the same problem. This emphasis at problem solving is likely expected and follows the previous emphasis at advancing specific (but somewhat narrow) methodologies. Of course, much remains to be learned; indeed the easy access to supercomputers will point out that brute force approaches are limited.

(42) The interested reader can obtain information on the Visitors’ Program by writing to the author. Inquiries on the visitor’s program at the European Scientific center in Rome should be. addressed to Dr. P. Sguazzero, IBM, VIA Giorgione 129, 00147 Rome, Italy.

Acknowledgment. It is my pleasure to thank Drs. G. Corongiu, and G. C. Lie, and S . Chin from this laboratory and Dr. M. Mareschal from the University of Bruxelles for discussions and permission to present aspects of their work prior to publication.

too has been made available for experimentations on parallelism to European universities and national l a b ~ r a t o r i e s . ~ ~ Before concluding this section, we would like to mention that migration of codes from sequential to parallel have proven to be notably simple especially for computer programs dealing with either stochastic problems or with quantum mechanical problems. Indeed in both cases, one AP can perform extensive computations essentially independently from an equivalent task on a second AP. The independence of the above tasks is the main reason for the high degree of parallelism, shown for example in Table I.