Nonadiabatic State-to-State Reactive ... - ACS Publications

Feb 7, 2013 - An efficient graphics processing units (GPUs) version of time-dependent wavepacket code is developed for the atom–diatom state-to-stat...
0 downloads 0 Views 327KB Size
Article pubs.acs.org/JPCA

Adiabatic/Nonadiabatic State-to-State Reactive Scattering Dynamics Implemented on Graphics Processing Units Pei-Yu Zhang and Ke-Li Han* State Key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, People’s Republic of China ABSTRACT: An efficient graphics processing units (GPUs) version of time-dependent wavepacket code is developed for the atom−diatom state-to-state reactive scattering processes. The propagation of the wavepacket is entirely calculated on GPUs employing the split-operator method after preparation of the initial wavepacket on the central processing unit (CPU). An additional split-operator method is introduced in the rotational part of the Hamiltonian to decrease communication of GPUs without losing accuracy of state-to-state information. The code is tested to calculate the differential cross sections of H + H2 reaction and state-resolved reaction probabilities of nonadiabatic triplet−singlet transitions of O(3P,1D) + H2 for the total angular momentum J = 0. The global speedups of 22.11, 38.80, and 44.80 are found comparing the parallel computation of one GPU, two GPUs by exact rotational operator, and two GPU versions by an approximate rotational operator with serial computation of the CPU, respectively. probabilities of the Cl + H2 and the N + N2 reactions.16 They found the speedups on the order of 2−20 times for GPU versus CPU. Comparing with the TID coupled-channel method, the time-dependent method with the split-operator method has a relatively simple computational procedure based on matrix− matrix multiplication that maps well to GPU architectures. It is then a matter of interest to see how GPUs can accelerate the state-to-state dynamics computation with the time-dependent method. In this work, the state-to-state quantum reactive scattering is effectively implemented on GPUs. A new split-operator method is introduced to increase the efficiency of the program, and the validity of the approximation caused by the new split-operator method is tested on H + H2 by comparing differential cross sections with exact results. Then, another simulation is carried out to obtain state-resolved reaction probabilities of the spin− orbit transition in the O(3P,1D) +H2 reaction at the total angular momentum J = 0 without considering electronic angular momentum operators. The O(3P) + H2 → H + OH(X2Π) reaction is important in understanding the chemistry of combustion processes and other atmospheric reactions.17 There are intersystem crossings between the triplet and singlet electronic states of the O + H2 system,18 and the potential energy surfaces (PESs) of O(1D) + H2 should be included to achieve accurate description of the O(3P) + H2 reaction. Many PESs have been developed to study adiabatic and nonadiabatic reaction dynamics of this reaction.19,20 Dobbyn and Knowles have constructed DK PESs of singlet states (1,21A′ and 11A″)

1. INTRODUCTION A state-to-state quantum reaction dynamics study provides the most detailed observables and offers profound insight of chemical reaction process. In the past decades, several approaches for quantum reaction dynamics have been developed. With hyperspherical coordinate, the time-independent (TID) coupled-channel method has been employed to study many triatomic systems.1,2 On the other hand, the timedependent method has been wildly used and implemented by different schemes because it does not scale as a cube of the number of basis functions. The coordinates can be chosen as reactant Jacobi, product Jacobi, or hyperspherical coordinates.3−5 The wavepacket can be real or complex6,7 and be propagated by the split-operator method,8,9 Chebyshev polynomial expansion, finite difference approaches, and so forth.10,11 While the calculations of differential cross sections (DCSs) have been reported for many triatomic reactions and two tetraatomic systems up to now,12 theoretical calculations are still a great challenge of computational time consumption. Thus, the parallelism on central processing units (CPUs) has become a routine for state-resolved quantum dynamics. Because largescale parallelism with hundreds of computational nodes is still expensive, parallelism on graphics processing units (GPUs) provides a good alternative for being able to access hundreds of cores in the single GPU. The GPU has been widely used in many traditional computational scientific areas. As to chemistry science, one GPU provides hundreds of times the performance of a single CPU core in molecular dynamics and electronic structure calculations.13,14 Recently, the CPUs/GPUs implemented in the TID method show a 6.98 speedup obtained by three GPUs and three CPU cores on computational efficiency.15 Pacifici et al. have reported a GPU implementation for reactive scattering through the evaluation of the reactive © XXXX American Chemical Society

Special Issue: Structure and Dynamics: ESDMC, IACS-2013 Received: January 4, 2013 Revised: February 6, 2013

A

dx.doi.org/10.1021/jp400102r | J. Phys. Chem. A XXXX, XXX, XXX−XXX

The Journal of Physical Chemistry A

Article

along with electrostatic coupling between 11A′ and 21A′,19 and a number of nonadiabatic dynamics calculations have been carried out on DK PESs.21−24 The singlet PES 11A′ of Dobbyn et al. and the triplet PESs 13A″ and 13A′ of Rogers et al. were used in previous works to study the spin−orbit-induced transition in O(1D,3P) + H2.25−28 In this work, a nonadiabatic quantum dynamics calculation was carried out based on the above PESs to obtain the product branching ratio of OH(2Π3/2) to OH(2Π1/2), which can be compared directly to experimental measurement. In two previous relevant studies,27,28 the diabatic representation was used as asymptotic reactants and throughout in the whole calculation to produce the interested dynamical quantities including the product fine-structure ratio. However, the results generated in this way cannot be compared directly with experiment observables. Thus, in the present calculation, instead of treating asymptotic reactant and/ or product states diabatically, we choose asymptotic adiabatic states under spin−orbit splitting to obtain converged total and state-to-state reaction probabilities as well as the branching ratio for this reaction at total angular momentum J = 0. Such treatment is important to predict the branching ratio comparable to the experimental measurement, though it has little influence on the total averaged reaction probability. This paper is organized as follows. The next section outlines theoretical methods for calculating the reaction probabilities and differential and integral cross sections. In section 3, we describe the details of parallelism. In particular, a new splitoperator method for the rotational matrix is introduced to decrease communication among GPUs. Then, the code is applied to H + H2 and O + H2 reactions. The conclusions are presented in section 4.

molecule. Then, the initial wavepacket is projected from the SF representation to the BF representation using the rotational function |JMj0K⟩, with K being the projection of the total angular momentum J in BF coordinates. Finally, the initial wavepacket at the grid (Rv0, rvi, θvj, Kβ) in BF product Jacobi coordinates is directly calculated by the projection of the wavepacket in BF reactant Jacobi coordinates ψ (R v0 , rvi , θvj , Kβ) =



d KJε* βK α

⎛ ⎛ ⎞ Δ⎞ ̂ Δ⎟ Ψ JMp(R⃗ , r ⃗ , t + Δ) = exp⎜ −iĤ 0 ⎟ exp⎜ −iVrot ⎝ ⎝ 2⎠ 2⎠ ⎛ ⎞ ⎛ ⎞ ̂ Δ ⎟ exp⎜ −iĤ 0 Δ ⎟Ψ JMp(R⃗ , r ⃗ , t ) exp( −iV Δ) exp⎜ − iVrot ⎝ ⎝ 2⎠ 2⎠ (5)

ℏ2 ∂ 2 ℏ2 ∂ 2 · 2 − · + Vr(r ) Ĥ 0 = − 2μR ∂R 2μr ∂r 2 ̂ = Vrot

2

2 2 = ⟨J ′K ′j′K ′|J ̂ − j ̂ |JKjK ⟩

(1)

= [J(J + 1) + j(j + 1) − 2K 2]δjj ′δkk ′ − [J(J + 1) + K (K + 1)]1/2 [j(j + 1) − K (K + 1)]1/2 δjj ′δkk ′+ 1 − [J(J + 1) − K (K − 1)]1/2 [j(j + 1) − K (K − 1)]1/2 δjj ′δkk ′− 1

2

ℏ ∂ h(r ) = − + Vr(r ) 2μr ∂r 2

0

00

(8)

The radial component of product wavepacket is chose to be a delta function times the outgoing asymptotic radial functions6

(2)

In nonadiabatic calculations, V(R, r, θ) is a diabatic potential energy matrix for coupled electronic states.31,32 The reactant is prepared in the space-fixed (SF) reactant Jacobi coordinates system using a Gaussian wavepacket in the R direction, |χi ⟩ = ΨαJMv0εj l0 = G(R α)ϕv j (rα)|JMj0 l0ε⟩

(7)

⟨J ′K ′j′K ′|L̂ |JKjK ⟩

where μR is the reduced mass between atom C and diatom AB, μr is the reduced mass of AB, J ̂ is the total angular momentum operator, j ̂ is the rotational angular momentum operator of AB, V(R, r, θ) is defined as Vpes − Vr(r), where Vpes is potential energy of the triatomic system and Vr(r) is the diatomic potential energy of AB, and h(r) is diatomic reference Hamiltonian defined as 2

2 2 ĵ L̂ + 2μR R2 2μr r 2

(6)

where L = J − j. To obtain the state-to-state information, several K (NK) components of the wave function for each total angular momentum J is used to get convergent dynamics information. In the BF representation, L is not a good quantum number for the rotational basis of associated Legendre polynomials. Thus, the Coriolis coupling matrix (HK) is tridiagonal, and under the conditions without considering electronic angular momentum operators

2 (J ̂ − j ̂)2 ĵ ℏ2 ∂ 2 + + + h(r ) 2μR ∂R2 2μR R2 2μr r 2

+ V (R , r , θ )

(4)

Then, this wave function can be propagated in the BF product Jacobi coordinates by the split-operator scheme

2. THEORY The Hamiltonian of the A + BC → C + AB reaction in the body-fixed (BF) product Jacobi coordinates can be written as29,30 H=−

∑ ψ (R α(R v0 , rvi , θvj), rα(R v0 , rvi , θvj), θα(R v0 , rvi , θvj), Kα)

|χf ⟩ = δ(R − R ∞)ϕv j (rB)|JMjf l f ⟩

(9)

f f

where R∞ is a fixed radial coordinate in the asymptotic region. Then, the time-dependent expression for the scattering matrix element in the BF representation is obtained33

(3)

|JMj0l0ε⟩ represents the SF rotational basis and describes the angular motion, with M being the projection of the total angular momentum J for the initial state (v0, j0, l0). ϕv0j0(rα) represents the rovibrational eigenfunction of the diatomic

Sfi(E) =

1 2πaαi(E)aβ*f (E)

+∞

∫−∞

̂

dt e(i/ ℏ)Et ⟨χβf |e(i/ ℏ)Ht |χαi ⟩ (10)

The coefficients are given by B

dx.doi.org/10.1021/jp400102r | J. Phys. Chem. A XXXX, XXX, XXX−XXX

The Journal of Physical Chemistry A

Article K

+

aαi(E) = ⟨ϕi|χi ⟩ =

i

μR kαi 2π

μR kβf

aβf (E) = ⟨ϕf |χf+ ⟩ =



Rhl(2) (kαiR )|G(R α) α

(11)

Rhl(1) (kβf R )|δ(R − R ∞) β (12)

where h(1,2) are the spherical Hankel functions of the first and second kind.34 Finally, the scattering matrix is transformed into helicity representation to derive differential and integral cross sections by a scattering matrix summing over all relevant total angular momentum quantum numbers J. dσv ′ j ′ , v0j (θ , E) 0



=

1 (2j0 + 1) 2

∑∑ K′

K0

1 2ikv0j

0

∑ (2J + J

1)dKJ ′ K 0(θ)SvJ′ j ′ , v0j 0 (13)

σv ′ j ′ , v0j

0

1 = (2j0 + 1)k v20j

0

∑ ∑ ∑ (2J + 1)|SvJ′ j ′ ,v j |2 00

K′

K0

2

action of the rotational operator e−iH Δ/(ℏ2μRR ). Thus, it will minimize the amount of time transferring data among GPUs. This distribution of the wave function is similar to the work of Hankel et al.,7 in which NK components of the wave function are distributed to NK CPU cores. Every GPU should communicate with all of the other GPUs in the above scheme. There are two different ways to communicate data between GPUs in CUDA. One is peer-topeer (P2P) memory access in which the data is copied between GPUs without going through the host memory, and another is through host memory over PCI-e (noP2P) by copying data from one GPU to the host CPU memory and then copying the data from the host CPU memory to another GPU. The test shows that the time consumption of the second scheme is two times larger than that of the first one. In every time step, about 13 and 40% of the total time is consumed by communication between different GPUs for the P2P and noP2P schemes, respectively. This condition is even worse for the calculation using more than two GPUs. With the number of GPU NP increasing, (NP − 1)Nall elements of the wave function have to be copied among different GPUs, in which Nall is the number of basis functions of the wavepacket. When NP = 6, at least 50% of the total time is estimated to be consumed by GPU communication. Thus, the present scheme is acceptable for parallelism on only a few GPUs, and if we want to use a relatively big number of GPUs, it is necessary to find an alternative scheme to decrease GPU communication. Because the Coriolis coupling matrix HK is tridiagonal, the Kth component of the wave function will just couple with the (K + 1)th and (K − 1)th components. The rotational operator Vrot, which is the exponential of HK, will be the tridiagonally dominant matrix. Thus, instead of the normal split-operator procedure, it is possible to obtain a scheme in which the Kth component will just couple with a few near components. For this purpose, several schemes can be used, and in this work, we approximate the Vrot matrix by the splitting scheme. The Coriolis coupling matrix is divided into two block-diagonal matrices, A and B, which preserve the exponential of the A and B block-diagonals.

J

(14)

3. PARALLEL COMPUTING ON GPUS A. Parallel Scheme. To implement a version of the quantum dynamics program for the compute unified device architecture (CUDA), we used a Fortran language CUDA extension included in the PGI compiler. In the time-dependent quantum method, almost all of the program run time is spent at the loop of propagation of the wavepacket. Thus, the initial wavepacket, the matrix for transformation, and the Hamiltonian are calculated on the CPU. After the preparation for propagation on the CPU, the data are copied from the CPU memory to the GPU one. Then, the propagation of the wavepacket is entirely run on the GPUs. Thus, we do not need to spend the time copying data back and forth between CPU memory and GPU memory except for the part for flux analysis, which is parallelly executed in the CPU. The CPUs of the computing nodes can still be used to run other works without affecting the GPU computation. Using the split-operator method to propagate the wavepacket, most of the runtime on each time step in the program is consumed in two parts, (1) the transformation of the wavepacket between the discrete variable representation (DVR) and finite basis representation (FBR) and (2) the actions of several parts of the Hamiltonian on the wavepacket. Both of these parts are actual matrix multiplication and therefore very viable to realize parallelism in the GPU. In the program, the wavepacket propagation includes 12 transformations and 10 actions on each time step. On the basis of the matrix multiplication example in the PGI manual, more than 10 different subroutines of matrix multiplication have been developed to achieve the special capabilities of transformations of the wavepacket in different coordinates (R, r, and θ) and the actions of the Hamiltonian on the wavepacket. The wave function is divided into NP parts, with NP being the number of GPUs taken, and every GPU saves the components of the wave function with a number of specific K: (0, 1, 2, ..., NK/NP), (NK/NP +1, ..., 2NK/NP), ..., ((NP − 1)NK/NP + 1, ..., NK). The advantage of such a distribution is that the communication of different GPUs will just happened at the

⎤ ⎡ H K /2 H K 12 ⎥ ⎢ 11 ⎥ ⎢ H K H K /2 11 ⎥ ⎢ 12 K K ⎥ A=⎢ H33 /2 H34 ⎥ ⎢ K K ⎢ H34 H44 /2 ⎥ ⎥ ⎢ ⎣ ⋱⎦

(15)

⎡ H K /2 ⎤ ⎢ 11 ⎥ K K ⎢ ⎥ H22 /2 H33 ⎢ ⎥ K K ⎥ B=⎢ H33 H33 /2 ⎢ ⎥ ⎢ K K⎥ H44 /2 H45 ⎢ ⎥ ⎢ K H45 ⋱ ⎥⎦ ⎣

(16)

Then, the exponential of the rotational matrix can be expressed by the split-operator method with third-order or fifth-order precision35 V3(Δ) = eΔA /2eΔBeΔA /2 + O(Δ3) C

(17)

dx.doi.org/10.1021/jp400102r | J. Phys. Chem. A XXXX, XXX, XXX−XXX

The Journal of Physical Chemistry A

Article

Figure 1. Differential cross sections as a function of total energy of the rovibrational state v′ = 0, 1 and j′ = 0, 2, 4, 6 for two scattering angles θ = 0 and 90°. The lines and symbols represent results from our work and ABC codes, respectively.1.

V5(Δ) = e sΔA /2e sΔBe(1 − s)ΔA /2e(1 − 2s)ΔBe(1 − s)ΔA /2e sΔB 1 e sΔA /2 + O(Δ5) s= (2 − 21/3)

full matrix Vrot. Thus, the approximation reduces the number of multiplication for action of Vrot on the wavepacket. B. Test on the H + H2 Reaction. The H + H2 reaction is chosen to test the validity of this approximation in actual calculations. In this test, the GPU and CPU are the Tesla C2050 GPU and CPU E5620 @ 2.40 GHz, respectively. The calculations have been performed by employing a BKMP2 surface.36 Figure 1 shows the state-to-state DCS as a function of total energy for two scattering angles of θ = 0 and 90° for the rovibrational state v = 0, 1 and j = 0, 2, 4, 6. Clearly, all of the wavepacket DCSs in Figure 1 are in perfect agreement with TID results over the entire energy range. We have also calculated the DCSs with the rotational operator approximated by the third-order split-operator method. The average absolute error between the results with and without the approximation are less than 0.1% in the whole energy area. If showing these results in Figure 1, they totally overlap with the results calculated without approximation in the rotational operator. Thus, the tridiagonally dominant matrix V rot can be approximated by the third-order split-operator method without losing the accuracy of state-to-state information in the H + H2 reaction. Because this reaction is a direct reaction and most of the reaction is happened at R > 2 bohrs, a number of quantum dynamics will be performed in the future to verify whether the approximation of Vrot is appropriate on the reactions with a deep well. Figure 2 shows speedups of parallel computation with the GPU compared with the serial calculation by the CPU with the total angular momentum J. The average speedups of J ≥ 10 are 22.11, 38.80, and 44.80 for one GPU, two GPUs with an exact rotational operator, and the two GPUs versions with an approximate rotational operator, respectively. For one GPU, the speedup is almost a constant for J ≥ 10. For the two GPUs versions, the speedup varies when the NK is an odd or even number because the amounts of calculation are more or less different on two GPUs. The speedup varies around a constant for two GPUs with ans exact rotational operator and increases with the increasing J for the two GPUs version with an

(18)

The importance of the Coriolis coupling is different in different reactions. The magnitude of Coriolis coupling energy matrix elements is determined by the J and R. Thus, we choose Δ = ΔtJ(J + 1)/R2 rather than Δt to analyze the data error of the approximation. We can find in this scheme that the error of the split-operator method increases with decreasing distance R of the collision pair and increasing time step Δt and total angular momentum J. Thus, the approximation in eq 17 for Vrot is expected to be satisfactory at relatively large R. At R → 0, the validity of this approximation decreases, and therefore, we expect this region to be the main source of error. Thus, if the third-order split-operator method is not a good approximation at small R, the fifth-order split-operator method can be chosen to get exact results. However, with R decreasing, the centrifugal potential will be very large, and there should be only a few parts of the wave functions at this region for most of the reaction. When using the third-order split-operator method for the rotational operator, only (3NP/NK)Nall rather than (NP − 1)Nall elements of the wave function have to be copied among different GPUs. The approximation leads to a reduction of computational time by 3(NP − 1)/(NKNp) in the optimal case. Thus, the sum of data of the communication among GPUs is scaled by NK to 3. The ideal time consumption of communication with the approximation will be only 3/NK times the normal scheme. In the calculation with an exact rotation Hamiltonian, we have mentioned that the data communications need about 13 or 40% of the total elapsed time in P2P and noP2P schemes, respectively. Thus, at NK = 12, the percentages of ideal time consumption of communications in the calculation with the approximation will be less than 3.3 or 10%, respectively. With increasing NK, these percentages will continue to decrease. Another advantage of this approximation is that the Hamiltonian acting on the wavepacket is three tridiagonal matrices rather than the original D

dx.doi.org/10.1021/jp400102r | J. Phys. Chem. A XXXX, XXX, XXX−XXX

The Journal of Physical Chemistry A

Article

Figure 2. Speedups of parallel computation with the GPU compared with the serial calculation by the CPU versus the total angular momentum J. The lines with squares, circles, and triangles show the speedups of the parallel computation of one GPU, two GPUs by exact rotational operator, and the two GPUs versions by an approximate rotational operator, respectively.

Figure 3. Reaction probabilities as a function of energy in the calculation with the initial wavepackets (a) (ψ2 − ψ3)/21/2, (b) ψ1, and (c) (ψ2 + ψ3)/21/2. Probabilities of the reaction to OH(2Π3/2) and OH(2Π1/2) are shown with solid and dashed lines.

OH(2Π1/2), averaged over the three considered electronic states, shows a propensity to approach the statistical value of 1. The state-resolved probabilities in the ψ1 calculation at the total energy of 0.77 eV are shown in Figure 4. The probabilities

approximate rotational operator. At J = 10 and 24, the speedups for the two GPUs version with the approximate rotational operator are 41.21 and 47.70, respectively. In this version, the amount of data communication between GPUs is invariant, and the amounts of calculation increase with the increasing J. Thus, the percentage of time consumption for data communication will decrease with the increasing J. This is an attractive character because the main time consumption for a state-tostate calculation is at the calculation of the scattering matrix with large J. C. Nonadiabatic Dynamics of O(3P,1D) + H2. The parallel method can be directly extended to nonadiabatic quantum dynamics. The multistate wavepackets distributed to GPUs are specific K components, just as the adiabatic case. Both the Coriolis coupling at the rotational part and coupling of PESs lead to nonadiabatic transition. Thus, the nonadiabatic dynamics code calculates the wave functions of different electronic states separately in the action of the Hamiltonian H0 shown in eq 5 and considers the couplings by sharing wave functions between GPUs. This GPU version of the nonadiabatic quantum dynamics is used to study the spin−orbitinduced triplet−singlet transition of O(3P,1D) + H2. The spin− orbit coupling was developed by Hoffmann et al.17 employing the model of four electronic states 3A″(1), 3A″(2), 3A′, and 1A′. We use the same switching functions as the Garashchuk et al.27 to ensure the degeneracy of all four surfaces in the product region. The wave functions of adiabatic splitted states are linear combinations of diabatic wave functions ψ1−4 on these electronic states. In this model, the split reactants O(3P) correspond to three adiabatic states (ψ2 − ψ3)/21/2, ψ1, and (ψ2 + ψ3)/21/2, respectively. In the product asymptotical region, (ψ1 − ψ3)/21/2 and (ψ2 + ψ4)/21/2 correspond to the H + OH(2Π3/2) product, and (ψ1 + ψ3)/21/2 and (ψ2 − ψ4)/21/2 are related to the H + OH(2Π1/2) product, respectively. Thus, the initial wavepackets are prepared as (ψ2 − ψ3)/21/2, ψ1, or (ψ2 + ψ3)/21/2 to carry on propagation on four PESs. Figure 3 shows the total reaction probabilities at the total angular momentum J = 0. All of the probabilities almost linearly increase, with the total energy exceeding the threshold of the reaction. In the calculation with the initial wavepacket ψ1, the probability of OH(2Π1/2) is smaller than that of OH(2Π3/2) in the whole energy region. The opposite results are found in the (ψ2 − ψ3)/21/2 and (ψ2 + ψ3)/21/2 calculations. Overall, the branching ratio of OH(2Π3/2) to

Figure 4. Vibrational and rotational state distributions of OH product versus OH rotational quantum number j′ at the total energy of 0.77 eV. (a) The distributions on the 3A″(1) and 3A′ diabatic surfaces with initial wavepacket ψ1. (b) The distributions in adiabatic calculation on a single PES 3A″ and nonadiabatic calculation with initial wavepacket ψ 1.

on the 3A″(2) and 1A′ PESs from the nonadiabatic calculation are less than 2% but are not shown here. In Figure 4b, the total probabilities are compared with the results of adiabatic singlesurface dynamics. The nonadiabatic effect is found to have only a minor influence on the state-resolved probabilities. The probabilities of the vibrational ground state in the nonadiabatic calculation are slightly smaller than those in the adiabatic calculation at j′ < 5 but bigger than that at j′ ≥ 5. The state-resolved probabilities of the vibrational ground state in the (ψ2 + ψ3)/21/2 and (ψ2 − ψ3)/21/2 calculations are shown in Figure 5. The nonadiabatic transitions to the 3A′(1) surface bring a big contribution at low j′. Nonadiabatic transitions to the 1A′ surface have continual contribution in the probabilities of all of the rotational states. Compared to the probabilities in the (ψ2 + ψ3)/21/2 calculation, the results in the (ψ2 − ψ3)/21/2 calculation are only half as much as those on 3 A″(1) and 3A′ surfaces and almost the same as those on 3 A″(2) and 1A′. The shape of the rotational distribution of the OH product on all four surfaces is almost the same despite the difference in the quantities of the probabilities. E

dx.doi.org/10.1021/jp400102r | J. Phys. Chem. A XXXX, XXX, XXX−XXX

The Journal of Physical Chemistry A

Article

uses a few hundred gigabytes of memory. Allowing GPUs to communicate with the local host memory in the code for propagation will include the host memory in wavepacket propagation and increase up to 10 times the size of the simulation. If carefully arranging copying data and computation, making the code access the host memory in the part of propagation is estimated to decelerate the computation by about 20% and provides additional memory up to thousands of gigabytes. Thus, the parallelism on a few dozen GPUs is a favorable choice to get hundreds of times speedup of the timedependent quantum dynamics calculation. With parallelism on GPUs rather than between CPUs, the code allows us to study adiabatic/nonadiabatic reaction dynamics to obtain the state-tostate information. Moreover, the approach in this work can be extended to study polyatomic reaction systems involving more than three atoms.

Figure 5. Vibrational and rotational state distributions of the OH product versus OH rotational quantum number j′ on four PESs in the calculation with initial wavepackets (a) (ψ2 − ψ3)/21/2 and (b) (ψ2 + ψ3)/21/2 at the total energy of 0.77 eV.



In all of the calculations above, there are only a few nonadiabatic transitions to singlet surface 1A′, which is consistent with the results of Chu et al. and Garashchuk et al.27,28 The biggest probability on the 1A′ surface is 0.072 at 1.22 eV in the (ψ2 + ψ3)/21/2 calculation. Because we are studying the triplet−singlet transition, the importance of the potential energy of the singlet state 1A′ on the OH(2Π) branching ratio is tested with initial wavepacket ψ1 by decreasing the potential energy of the 1A′ PES by 1.0 kcal/ mol at the interaction region. The test shows that the branching ratios of OH(2Π3/2) and OH(2Π1/2) at the whole energy range are bigger than those in the calculation without decreasing the 1 A′ PES by 1.0 kcal/mol at the interaction region. The ratios are 1.79 and 2.45 at the total energy of 0.87 eV, respectively. Thus, the ratio can be severely affected by a minor change of potential energy 1A′ even if there are small probabilities on it.

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Notes

The authors declare no competing financial interest.



REFERENCES

(1) Skouteris, D.; Castillo, J. F.; Manolopoulos, D. E. Comput. Phys. Commun. 2000, 133, 128−135. (2) Balucani, N.; Casavecchia, P.; Banares, L.; Aoiz, F. J.; GonzalezLezana, T.; Honvault, P.; Launay, J. M. J. Phys. Chem. A 2006, 110, 817−829. (3) Zhang, J. Z. H.; Dai, J. Q.; Zhu, W. J. Phys. Chem. A 1997, 101, 2746−2754. (4) Gomez-Carrasco, S.; Roncero, O. J. Chem. Phys. 2006, 125, 054102/1−054102/14. (5) Sun, Z. G.; Lin, X.; Lee, S. Y.; Zhang, D. H. J. Phys. Chem. A 2009, 113, 4145−4154. (6) Lin, S. Y.; Guo, H. Phys. Rev. A 2006, 74, 022703/1−022703/8. (7) Hankel, M.; Smith, S. C.; Allan, R. J.; Gray, S. K.; Balint-Kurti, G. G. J. Chem. Phys. 2006, 125, 164303/1−164303/12. (8) Peng, T.; Zhang, J. Z. H. J. Chem. Phys. 1996, 105, 6072−6074. (9) Althorpe, S. C. J. Chem. Phys. 2001, 114, 1601−1616. (10) Gray, S. K.; Balint-Kurti, G. G. J. Chem. Phys. 1998, 108, 950− 962. (11) Zhang, H.; Smith, S. C. J. Chem. Phys. 2002, 117, 5174−5182. (12) Xiao, C. L.; Xu, X.; Liu, S.; Wang, T.; Dong, W. R.; Yang, T. G.; Sun, Z. G.; Dai, D. X.; Xu, X.; Zhang, D. H.; Yang, X. M. Science 2011, 333, 440−442. (13) Ufimtsev, I. S.; Martinez, T. J. J. Chem. Theory Comput. 2008, 4, 222−231. (14) Anderson, J. A.; Lorenz, C. D.; Travesset, A. J. Comput. Phys. 2008, 227, 5342−5359. (15) Baraglia, R.; Bravi, M.; Capannini, G.; Lagana, A.; Zambonini, E., A Parallel Code for Time Independent Quantum Reactive Scattering on CPU-GPU Platforms. In Computational Science and Its Applications  Iccsa 2011, Pt III; Murgante, B., Gervasi, O., Iglesias, A., Taniar, D., Apduhan, B. O., Eds.; Springer-Verlag: Berlin, 2011; Vol. 6784, pp 412−427. (16) Pacifici, L.; Nalli, A.; Lagana, A. Comput. Phys. Commum. 2013, DOI: 10.1016/j.cpc.2013.01.002. (17) Hoffmann, M. R.; Schatz, G. C. J. Chem. Phys. 2000, 113, 9456− 9465. (18) Alagia, M.; Balucani, N.; Cartechini, L.; Casavecchia, P.; van Beek, M.; Volpi, G. G.; Bonnet, L.; Rayez, J. C. Faraday Discuss. 1999, 113, 133−150. (19) Dobbyn, A. J.; Knowles, P. J. Mol. Phys. 1997, 91, 1107−1123. (20) Zhang, P. Y.; Lv, S. J. Commun. Comput. Chem. 2013, 1, 63−71.

4. CONCLUSIONS In this paper, a new parallel code for adiabatic/nonadiabatic state-to-state quantum dynamics has been implemented on GPUs. The propagation of the wavepacket is entirely calculated on GPUs. Thus, after preparation of the initial wavepacket, the CPUs of the computing node can still be used to do other calculations. The result shows a global speedup of more than 22.11 times compared with parallel computation of the GPU and serial computation of the CPU. The tests on two GPUs show 38.80 and 44.80 times speedups with and without the new split-operator method of the rotational matrix, respectively. The new split-operator method decreases the communication time among GPUs and is a good choice in multiple GPUs/ CPUs computations. The only thing to stop GPUs from being applied to polyatomic reaction systems involving more than three atoms is the shortage of video memory. In the ordinary time-dependent quantum dynamics methods using the DVR and FBR representations on the Jacobi coordinates R and r, the proper maximum number of computers in the parallel computation is the number of projections NK. If more computers are employed to carry the simulation, massive data communication among the computers is inevitable, and the efficiency will decrease seriously, except when employing other algorithms or executing on an expensive computational cluster. Thus, the video memory of only a few dozen GPUs taken will limit the size of the simulation that we can run. We estimate that the present scheme can be extended to an ordinary tetra-atomic system that F

dx.doi.org/10.1021/jp400102r | J. Phys. Chem. A XXXX, XXX, XXX−XXX

The Journal of Physical Chemistry A

Article

(21) Gray, S. K.; Balint-Kurti, G. G.; Schatz, G. C.; Lin, J. J.; Liu, X. H.; Harich, S.; Yang, X. M. J. Chem. Phys. 2000, 113, 7330−7344. (22) Gray, S. K.; Petrongolo, C.; Drukker, K.; Schatz, G. C. J. Phys. Chem. A 1999, 103, 9448−9459. (23) Lin, S. Y.; Guo, H. J. Phys. Chem. A 2009, 113, 4285−4293. (24) Takayanagi, T. J. Chem. Phys. 2002, 116, 2439−2446. (25) Rogers, S.; Wang, D. S.; Kuppermann, A.; Walch, S. J. Phys. Chem. A 2000, 104, 2308−2325. (26) Brandao, J.; Rio, C. M. A. Chem. Phys. Lett. 2003, 377, 523−529. (27) Garashchuk, S.; Rassolov, V. A.; Schatz, G. C. J. Chem. Phys. 2006, 124, 244307/1−244307/8. (28) Chu, T. S.; Zhang, X.; Han, K. L. J. Chem. Phys. 2005, 122, 214301/1−214301/6. (29) Zhang, J. Z. H. Theory and Application of Quantum Molecular Dynamics; World Scientific: Singapore, 2009. (30) Zhang, A. J.; Jia, J. F.; Wu, H. S. Commun. Comput. Chem. 2013, 1, 40−51. (31) Zhang, P. Y.; Lu, R. F.; Chu, T. S.; Han, K. L. J. Phys. Chem. A 2010, 114, 6565−6568. (32) Zhang, Y.; Xie, T. X.; Han, K. L.; Zhang, J. Z. H. J. Chem. Phys. 2003, 119, 12921−12925. (33) Dai, J. Q.; Zhang, J. Z. H. J. Phys. Chem. 1996, 100, 6898−6903. (34) Messiah, A. Quantum Mechanics; Wiley: New York, 1968. (35) Bandrauk, A. D.; Shen, H. J. Chem. Phys. 1993, 99, 1185−1193. (36) Boothroyd, A. I.; Keogh, W. J.; Martin, P. G.; Peterson, M. R. J. Chem. Phys. 1996, 104, 7139−7152.

G

dx.doi.org/10.1021/jp400102r | J. Phys. Chem. A XXXX, XXX, XXX−XXX