Optimal Design and Operation of a Spatially Distributed Multiscale

Jul 26, 2010 - Optimal Design and Operation of a Spatially Distributed Multiscale ... Citation data is made available by participants in Crossref's Ci...
0 downloads 0 Views 3MB Size
Ind. Eng. Chem. Res. 2010, 49, 7891–7900

7891

Optimal Design and Operation of a Spatially Distributed Multiscale Process, with Regard to Layered Heterostructure Growth Christopher M. Behrens and Antonios Armaou* Department of Chemical Engineering, The PennsylVania State UniVersity, UniVersity Park, PennsylVania 16802

Determining the optimal operation of processes modeled by multiscale systems presents many challenges, because of complexities in modeling and the extensive computational requirements needed to solve them. Application to layered heterostructure deposition is considered here, where an additional challenge of interface characterization is encountered. A solution methodology is applied to the fabrication of a device consisting of alternating layers of gallium arsenide and aluminum arsenide, with the intent of minimizing thickness nonuniformity and interfacial step density. Using a finite-element solver and kinetic Monte Carlo simulations, we were able to balance reductions in thickness nonuniformity and interfacial step density with maintaining favorable operating conditions in a computationally efficient manner. Introduction Frequently, processes used in industrial applications involve phenomena that require mathematical descriptions that span length scales over several orders of magnitude. Such systems, termed “multiscale” in this work, can range from the smallest length scale regimes and associated techniques (quantum, atomistic, and molecular), to the intermediate length scale regimes and techniques (kinetic Monte Carlo (kMC), LatticeBoltzmann), to those for the largest length scale regimes (continuum).1 Mathematical modeling in one regime may not be suitable in another, because of an inadequate level of detail (larger regime descriptions for smaller regime systems) or infeasible computational demands (smaller regime descriptions for large regime systems, e.g., atomistic simulations to describe continuously stirred tank reactor (CSTR) dynamics.) To solve multiscale problems, the differently sized systems can be solved using techniques most suitable for their particular length scale, then connected at an interface. Because of the increase in available computational power, the development of spatially distributed multiscale models was also recently pursued. These types of models have been used to describe systems in relation to simulating fluid flows,2,3 moving contact line problems,4 crack propagation,5 and chemical vapor deposition of thin films,6,7 among others. This paper focuses on multiscale processes that use macroscale continuum equations and mesoscale kMC simulations. The continuous economic pressure to increase profit margins and reduce costs of such processes has led a drive toward implementing strategies involving process optimization. Such optimization can present numerous challenges for multiscale systems. Continuum equations for reaction-transport processes typically feature coupled systems of partial differential equations (PDEs). Traditional practice has been to discretize these PDEs into a finite set of ordinary differential equations (ODEs) using finite differences or finite elements. Despite these challenges, optimization involving PDE continuum equation constraints has been achieved recently.8-11 Additional improvements regarding reduced gradient approaches,12 reduced successive quadratic programming approaches,13 and control vector parametrization14 have also been implemented to handle the large system of ODEs that are obtained from discretization methods. The optimization * To whom correspondence should be addressed. Tel.: +1 (814) 8655316. Fax: +1 (814) 865-7846. E-mail: [email protected].

procedure when using mesoscale kMC simulations is even more challenging, because of the stochasticity present in such systems. Standard procedure has been to treat the mesoscale function evaluator as a black box and use pattern search algorithms to converge toward a solution. Traditional search algorithms such as derivative-free optimization (DFO), Nelder-Mead, and Hooke-Jeeves are commonly used for this procedure, although gradient-based algorithms are possible (in theory), and specific methodologies15-17 have been developed. Also complicating the optimization of stochastic systems is function noise; recent algorithms for handling such systems have been developed and applied to polymerization reactors,18 flowsheet optimization,19 and supply chains,20 although addressing these challenges is still an open problem. We maintain that such methodologies can be applied toward the optimal operation of microelectronics fabrication processes, such as metal organic vapor phase epitaxy (MOVPE). MOVPE, which is also known as metal organic chemical vapor deposition (MOCVD), is one commonly used technique to fabricate thin films for use in various microelectronic applications.21,22 These films typically have thicknesses on the order of a few micrometers (roughly 102-103 molecules). Optimization of mesoscale regime properties, such as the smoothness of thin-film surfaces and interfaces in multilayer films, may be extremely useful to obtain maximum performance. Previous work in refs 23-25 has concentrated on multiscale optimization for GaN process, in the realm of surface roughness control, and for the catalytic oxidation of CO. We extend this work to consider the fabrication of layered heterostructures, which are thin films that consist of multiple layers of alternating material compositions. Such problems introduce new challenges involving control of reactions at the macroscale level, the determination of suitable metrics for characterizing interfaces in these devices, and the complexity of accounting for multiple properties at the mesoscale level. Problem Formulation We consider a spatially distributed multiscale process combining macroscale components describing continuum reactiontransport phenomena with mesoscale components that capture molecular properties at interfaces. Equations to describe the macroscale system are in the form of nonhomogeneous, secondorder, nonlinear PDEs with complex boundary conditions. The

10.1021/ie9020107  2010 American Chemical Society Published on Web 07/26/2010

7892

Ind. Eng. Chem. Res., Vol. 49, No. 17, 2010

solution methodology for the description of the mesoscale system may be even more daunting, because a deterministic solution may not be attainable. The equations describing macroscale and mesoscale systems may be time-varying and have spatial dependence on multiple dimensions. In addition, these systems may be connected at an interface that is dependent on inputs of both properties. Macroscale Model. Reaction-transport systems are typically modeled using coupled PDE systems. For nonisothermal chemical reactions with mobile fluids, the well-known balances for energy, species continuity, and momentum are typically used: ∇ · (-k∇T) ) -FCpu · ∇T ∇ · (-Di∇Ci) ) Ri - u · ∇Ci

(i

) 1, ..., m)

Fu · u ) ∇ · [-PI + µ(∇u + (∇u)T)]

(1) (2) (3)

where T is the temperature, Ci the concentration of species i, u the velocity vector, k the thermal conductivity, F the density, Cp the heat capacity, Di the diffusivity for species i, Ri the reaction rate for species i, µ the dynamic viscosity, P the pressure, and I the identity matrix. A total number of m species are described. Standard procedure for solving these systems of equations is discretization of the PDEs through finite-difference or finite-element methods into a set of ODEs. Approximation through low-order methods has also been used, particularly if the discretization procedure is a bottleneck in the multiscale simulation process.7,24,25 The boundary conditions used are prespecified velocities (Dirichlet) at the inlet, no-slip conditions (Dirichlet) and no flux (Neumann) at the walls, and convective flux (Neumann) at the outlet. For deposition processes, the conditions at the deposition surface are more complicated. Generally, we can assume no-slip (Dirichlet) conditions for velocity, temperature can either be constant (Dirichlet) or no flux (Neumann), and species concentration is described by the reaction rate (Robin). Specifically, the reaction rate information described by the unavailable in closed-form Robin boundary conditions may be obtained from mesoscale simulations. Mesoscale Model. Solutions to mesoscale systems are typically determined using methods such as molecular dynamics (MD) or kMC. Kinetic Monte Carlo methods provide a stochastic solution to the Master equation: ∂P(σ, t) ) ∂t

∑ W(σ′, σ)P(σ′, t) - W(σ, σ′)P(σ, t)

(4)

σ′

where σ and σ′ are system configurations, P(σ, t) is the probability that the system exists in state σ at time t, and W(σ, σ′) represents the probability of a transition from state σ to state σ′. Events in kMC simulations have been shown to follow a Poisson process,26 and since processes such as deposition events can be modeled using a Poisson distribution, kMC is a workable methodology for modeling such systems. Mesoscale modeling techniques are especially preferred when information about the positions of molecules is required, yet little detail about the inner workings of the molecules themselves is needed. To explain mesoscale modeling more clearly, consider the example of using kMC to model the evolution of surface properties in deposition. The macroscale equations in the previous section do not have the capability of providing information about surface details. Likewise, it is possible, in principle, to calculate the locations of molecules from their quantum movements, but doing so is quickly intractable for large

systems with complex molecules. In our deposition example, the rate of deposition and the rate of molecules migrating to alternate sites must be known. From this information, the basic methodology behind kMC is that we can choose random time intervals from a Poisson distribution and select an event (in this case, a lattice site being modified from “unoccupied” to “occupied by species X” or vice versa) at random. Based on its probability of occurring from rate information, another number from a random number generator is chosen to determine whether this event occurs or not. Then, the time step is advanced and the process is repeated until the desired time interval has been completed. The progression of these calculations then determines the evolution of the surface. The complexity and computational tractability of the kMC model is determined by the number of calculations that must be performed, based on the size of the lattice used and the number and rate of events occurring. However, larger lattice sizes produce more accurate results. In addition, the more interactions that are included between migrating molecules, the more realistic the simulation. Interface Characterization. A challenge associated with layered heterostructures is that the determination of the most appropriate methodologies for interface characterization is not always as straightforward as it is for purely surface features. This particularly applies when a measure of surface or interfacial smoothness is required. Several options for measuring roughness exist; two different metrics of roughness that are frequently employed include root-mean-square (RMS) roughness, derived from the autocorrelation function (ACF), and roughness derived from the height-density correlation function (HDCF). The equations used for RMS and HDCF roughness are summarized below for a cubic solid-on-solid (SOS) model:

RRMS )



n

∑ (x

- jx)2

i

i)1

n

(5)

n-1

∑ |x

i+1

RHDCF )

- xi |

i)1

n-1

(6)

where x is the height of any given lattice site and n is the total number of sites. An alternative method for determining interfacial smoothness is step density. This metric simply involves calculating the number of changes that occur between adjacent species, not being concerned with the magnitude of the changes or the average height of the interface. Mathematically, step density S (when the azimuthal angle is φ ) 0) is defined as the following for a cubic SOS model: S ) n-1

∑ (1 - δ(h

i,j, hi+1,j))

(7)

i,j

where δ is the Kronecker delta function; hi, j (where i and j represent the lattice position in length and width, respectively) is the height of lattice site i, j; and n is the number of lattice sites.27 The main advantage of using step density over other metrics for layered heterostructure problems is due to the formation of isolated islands of material (termed “bubbles” here), which occur when one or more molecules of one species becomes trapped within a layer of another species. Such layermixing phenomena have been observed in layered heterostructure systems.28 Bubble formation can make the interface height

Ind. Eng. Chem. Res., Vol. 49, No. 17, 2010

Figure 1. Step density assuming periodic conditions for a layered heterostructure. Note the existence of a “bubble” in the fifth monolayer.

unclear; however, with step density, only changes between adjacent species at the same height need to be noted, resolving the bubble formation complication. While rarely appearing in our simulations (less than 1 out of every 2500 sites is a bubble), they can still cause difficulties on large lattices when computing interfacial properties. Step density can be measured as either a spatial function, characterizing each molecular layer, or a temporal function, focusing on surface changes during process evolution. Spatial step density is calculated as a function of each layer l in the heterostructure upon completion of a simulation by calculating the number of changes in species between adjacent molecules. For a layer l far from an interface, the step density of that layer (Sl) will most likely be zero. If layer l is near an interface, then Sl will take on a positive value between 0 and 2. (The maximum possible step density across a dimension is one; spatial step density is calculated across both length and width.) The step density of an interface is the sum of the step densities of the individual layers that comprise the interface. For example, the interfacial step density of the lower interface in Figure 1 (gray boxes over white boxes) is 0.6, while the middle interface (white boxes over gray boxes) has a step density of 0.4. In contrast, temporal step density is calculated from the number of changes between adjacent molecules of the film surface as a function of time. This metric can prove useful in determining the evolution of step density throughout the deposition process. Temporal step density also can be measured experimentally using reflection high-energy electron diffraction (RHEED) data for industrial applications.27 However, it does have the disadvantage of being computationally expensive, because new calculations must be performed after every event. It is also important to note that spatial step densities are related, since the realization of the next step (for spatial step density) is only dependent on its current state (for temporal step density), i.e., it is a Markovian process. As a result, any trends that apply to the evolution of temporal step density would also apply to spatial step density. Optimization Search Method Optimization of multiscale systems can become quite a complex task, because they have macroscale components consisting of possibly time-dependent, nonhomogeneous, and possibly nonlinear second-order PDEs, and mesoscale components of time-dependent processes that are most efficiently modeled using stochastic techniques. While gradient-based techniques may be employed, their use is complicated by the unavailability of gradient information in closed form for the mesoscale portion. Standard practice is to use methods such as pattern search algorithms, derivative free optimization, and

7893

multilevel coordinate search, while treating the multiscale system function evaluations as a “black box”.24,25 Such methodology can quickly become computationally expensive, particularly as the complexity of information flow between the two directions increases. One possibility to circumvent issues for deterministic problems with expensive function evaluations is to use surrogate functions such as radial basis functions.29 However, this method is not amenable to noisy functions, because the sensitivities cannot be reliably computed. Multiscale Solution Algorithm. As previously noted, spatially multiscale systems typically are initially separated into their different length scale components for ease of solution. These “solutions” are then intimately connected at an interface that captures properties associated with all sizes in question. In some cases, information flow is unidirectional; this may occur when an input for the mesoscale system may require information from the solution to the macroscale problem, while the solution to the macroscale problem is independent of the results from the mesoscale problem, or vice versa. In this case, the interface is simple, because only one evaluation of the multiscale system at each length scale is required to obtain a solution. In contrast, bidirectional flow occurs when the solutions to the mesoscale system depend on solutions from the macroscale system, and the solutions to the macroscale system depend on solutions from the mesoscale system. This type of interface is considerably more complicated, because, to converge to a solution that satisfies the system, solutions must be determined separately at each length scale, and then iteratively converged between the length scales at each time step. Considerable research has been performed regarding the development of methodologies used in computationally efficient modeling of spatially or temporally multiscale systems. These include the use of timesteppers in equation-free computing,30-32 filter application,33 coarse graining of Monte Carlo simulations,34 self-organizing and simple cell mapping,35 and lattice reconstruction techniques;23 such topics are an ongoing area of study. Stochastic In Situ Adaptive Tabulation. In multiscale optimization, numerous mesoscale simulations, which may be computationally intensive themsleves, may be required to obtain a single description of the multiscale system. Combining that with possible iterative convergence requirements and the need for numerous multiscale evaluations for optimization, it becomes apparent that this process can quickly become prohibitively expensive. While some problem-specific simplifications can often be made, these usually do not reduce computation time by more than an order of magnitude. As a result, approaches to reducing computation time have been developed. One such promising approach is known as stochastic in situ adaptive tabulation (SISAT.) Based on in situ adaptive tabulation (ISAT), which is a technique that was developed for (and first used) in combustion chemistry,36 SISAT was applied to multiscale systems involving stochastic optimization.24,25 ISAT is particularly effective when simulations repeatedly traverse similar regions in state space, because that is what occurs in optimization. The mathematical justification criteria for ISAT and a more-detailed explanation of the ISAT algorithm can be found in ref 36. We will present the basics of the method here for completeness. In general, we want to obtain simulation results of interest, R(φq), for different values of query points, φq. Every query point contains all input properties needed to determine the dynamic evolution of the important solutions of the process of interest. These results can be obtained directly from dynamic or steadystate simulations. The question that needs to be answered is

7894

Ind. Eng. Chem. Res., Vol. 49, No. 17, 2010

this: Can we avoid such expensive simulations and use previous simulation results? We would like to find a tabulated point, φ0 within the proximity of φq, such that we can use the tabulated solution information, R0 ) R(φ0), to estimate a solution to φq. To achieve this, we must determine not only the trajectory R0 of any given point φ0, but also their sensitivities to each of the input elements of φ0, A0. The elements of matrix A0 are mathematically described by the following relation: [A0]ij(φ) ≡

∂Ri(φ) ∂φj

(8)

Any time that a simulation is required, the input properties φ0, the solution information R0, and the sensitivity matrix A0 are tabulated. When using SISAT to calculate sensitivity information, the exact same random number sequence must be used to ensure that the sensitivity matrix is not influenced by stochastic noise. This technique, which is referred to as finite differences with common random numbers (FDCRN), is outlined in ref 37. This ensures that the any variation in the solutions between the initial input and a perturbed input is due to the perturbation itself and not stochastic noise.38 Note that, while FDCRN is necessary for determining the trajectories and sensitivities for a particular point, different random number sequences are used for different points. When we place a query for a point φq, it is compared with tabulated points and a determination is made regarding if it is contained within a region that allows for linear interpolation; this metric of proximity between query and tabulated points is referred to as the ellipsoid of attraction (EOA). The exact size of the EOA for any given tabulated point φ0 is determined from the sensitivity matrix A0(φ0) for that point, and a user-specified acceptable error tolerance (εtol). To identify the EOA, normalize the trajectories by defining the square matrix A*, where A* )

A0AT0 εtol2

(9)

Next, eigenvalue decomposition of A* is performed and a set of eigenvectors Q and eigenvalues Λ is obtained to determine principal directions and magnitudes. If the distance δ ) φq φ0 falls within the EOA, given by the relation δTQTΛQδ e 1

(10)

we can then linearly approximate Rq as Rq ≈ R˜q ) R0 + A(φq - φ0)

(11)

If the query point does not fall within an EOA for any tabulated point, the process is simulated to obtain Rq, the sensitivities are calculated, and a new record is added. The size of the EOA can also be expanded if the simulation results indicate that the interpolated solution is within an acceptable error tolerance εtol. If the estimate R˜q that we obtain from the closest ellipsoid of attraction still gives a fairly accurate predicted trajectory, i.e., R˜q - Rq e εtol, we can increase the size of the EOA to include this new point. For further details on this, the reader is referred to ref 36. SISAT has the advantage that the table is built as the simulations are performed, and, therefore, potentially memoryintensive preconstruction techniques are not applied. To avoid an extensive computation effort from table lookup, ISAT was initially implemented using a binary tree structure. This practice

has traditionally been employed for SISAT as well, but it is not necessarily a requirement. Currently, ellipsoid methods are used to reduce complexity of query and lookup errors. For a small number of tabulated points, mandating that query points are checked against all tabulated points may potentially reduce lookup error and the number of required table entries, therefore also reducing the computational cost, particularly when relatively few tabulated entries are involved. Remark 1: SISAT is an efficient technique primarily when large numbers of simulations are required. Implementation initially incurs additional computational cost by requiring multiple simulations for a single query. The savings occur when a sufficiently large number of queries can be interpolated, such as that during the solution of an optimization problem. Application to Gallium Arsenide/Aluminum Arsenide Deposition The methodology outlined in previous sections was applied to a model problem: a layered heterostructure consisting of alternating layers of gallium arsenide (GaAs) and aluminum arsenide (AlAs) deposited using MOVPE. Such a device could prove useful in Bragg reflectors,39 because GaAs and AlAs are nearly lattice-matched, resulting in few defects, yet have significantly different refractive indices and bandgap energies. For industrial production, the percentage of functional devices can be increased by minimizing the thickness nonuniformity and the interfacial roughness between layers. Thickness uniformity can be calculated macroscopically by solving continuum mass, energy, and momentum balances, but interfacial roughness must be measured at a mesoscale level to capture adequate detail. This process uses a rotating disk reactor, with one inlet at the center of the reactor (premixing is not a concern for the reactants). A reactor schematic is shown in Figure 2, and the reactor dimensions are outlined in Table 1. As a result, we used the reactor configuration from ref 10 (the specific dimensions are not design variables.) Although rotation improves thickness uniformity in the angular direction, these effects to the fluid flow were not considered here. Deposition occurs on a substrate (wafer), which acts as the interface between the macroscale and mesoscale simulations. Throughout the deposition process, GaAs and AlAs are deposited in alternating sequences. A process feature is the period of time where no deposition occurs (which we term annealing time); this occurs between deposition of each species and allows for the surface of the film to relax. Macroscale Reaction and Transport Process. We use standard precursors for the MOVPE process for depositing GaAs and AlAs thin films: trimethyl gallium (TMG, Ga(CH3)3) and trimethyl aluminum (TMA, Al(CH3)3) as the Group III or organometallic source, M may be used in place of Ga or Al when mentioning a nonspecific organometallic species, arsine gas (AsH3) as the Group V source, and hydrogen as a carrier gas. In our formulation, the inlet concentration of arsine is 7.78 times larger than the inlet concentration of the organometallic sources. The hydrogen concentration is several orders of magnitude greater. Reactions between the organometallic precursors and hydrogen can occur, producing numerous intermediates. We used the reaction scheme found in ref 40 as the basis for our gas-phase reaction scheme. By investigating relative reaction rates and species concentrations, and applying quasisteady-state assumptions (radical intermediates are immediately consumed), we were able to simplify the gas-phase reaction scheme to the following reactions for TMG and TMA, respectively:

Ind. Eng. Chem. Res., Vol. 49, No. 17, 2010

1 Ga(CH3)3 + H2 f Ga(CH3)2 + CH4 2 1 Ga(CH3)2 + H2 f GaCH3 + CH4 2 Ga(CH3)3 + H2 f GaH(CH3)2 + CH4 GaH(CH3)2 f GaCH3 + CH4

Table 1. Reactor Dimensions

1 Al(CH3)3 + H2 f Al(CH3)2 + CH4 2 1 Al(CH3)2 + H2 f AlCH3 + CH4 2 Al(CH3)3 + H2 f AlH(CH3)2 + CH4 AlH(CH3)2 f AlCH3 + CH4

Table 2. Boundary Conditions

We are interested in obtaining the flux of organometallic species across the wafer surface where deposition occurs, to calculate thickness uniformities and provide information to the mesoscale simulation for determining interfacial step densities. The flux rates are functions of initial temperatures, concentrations, and velocities; we consider the wafer temperature, inlet velocity when gallium is being deposited, and inlet velocity when aluminum is being deposited to be variables. The inlet temperature and concentrations of all species are kept constant. We solved the continuity equations found in the Macroscale Model section using the boundary conditions in Table 2. We assumed a constant wafer surface temperature and simplified the traditionally Robin boundary conditions by assuming that a metallorganic species will immediately react with AsH3, effectively reducing the concentration to zero on the surface, and by assuming H2 is in sufficient excess so that its concentration will not change significantly with surface chemical reactions. We maintain a constant pressure throughout the reactor, reflected by a hydrogen concentration of 0.1 atm. The expressions for thermal conductivity, heat capacity, and viscosity were estimated from refs 41 and 42. Diffusion coefficients were estimated from Chapman-Enskog theory.43-45 Density was calculated using the ideal gas law. We assume that, during deposition and annealing, pseudo-steady-state operation occurs in the reactor, implying that conditions inside the reactor are independent of time in a particular process. To account for the effects of changing inlet concentrations, we estimated the decay rate based on previously calculated time-dependent profiles. Mesoscale Modeling of Surface Reaction. Upon reaching the wafer boundary, metallorganic species can adsorb on the surface. Since arsine is in excess, it is assumed that a metallorganic molecule will immediately react with an arsine molecule when reaching the surface, forming GaAs or AlAs and releasing methane gas. The amount of hydrogen produced

Figure 2. Schematic of the proposed MOVPE reactor.

parameter

value

radius height inlet diameter inlet temperature

5 cm 4.5 cm 1.25 cm 300 K

boundary center center inlet wafer surface outlet side walls

7895

mass

energy

momentum

axial symmetry C [mol/m3] M(CH3)3, 4.06 [mol/m3] H2 0 [mol/m3] Mi, ∂C(AsH3)/∂z ) ∂iMi/∂z convective flux no flux

axial symmetry 300 [K]

axial symmetry v [m/s]

T [K]

0 [m/s]

convective flux no flux

convective flux 0 m/s

in surface reactions is insignificant, compared with the total concentration of hydrogen, and is therefore neglected. The surface reactions for gallium- and aluminum-containing species, which can be obtained by performing mass balances, are given as follows: Ga(CH3)3 + AsH3 f GaAs + 3CH4 1 Ga(CH3)2 + AsH3 f GaAs + 2CH4 + H2 2 GaCH3 + AsH3 f GaAs + CH4 + H2 GaH(CH3)2 + AsH3 f GaAs + 2CH4 + H2 Al(CH3)3 + AsH3 f AlAs + 3CH4 1 Al(CH3)2 + AsH3 f AlAs + 2CH4 + H2 2 AlCH3 + AsH3 f AlAs + CH4 + H2 AlH(CH3)2 + AsH3 f AlAs + 2CH4 + H2 The interface between the macroscale and mesoscale control volumes involves calculating step densities at various points across the wafer surface, based on their deposition rates. We use a cubic solid-on-solid (SOS) model, assuming perfect lattice match and no overhangs in our deposition. The inputs to the model include the temperature of the wafer surface, the deposition rate calculated from the macroscale simulations (these values are converted from mol/m2/s to monolayers/s, based on GaAs and AlAs lattice constants),46,47 deposition time, and annealing time. In the kMC model, two different types of events are possible: deposition or nearest-neighbor migration. The likelihood of a migration is based on the number of nearest neighbors that a molecule has. Each molecule at lattice position Xi, j and height k has 12 neighbors, outlined in Table 3. Values

7896

Ind. Eng. Chem. Res., Vol. 49, No. 17, 2010

Table 3. Nearest-Neighbor Modela adjacent (4) diagonal (4) adjacent and underneath (4) a

Xi-1, j, k, Xi+1, j, k, Xi, j-1, k, Xi, j+1, k Xi-1, j-1, k, Xi-1, j+1, k, Xi+1, j-1, k, Xi+1, j+1, k Xi-1, j, k-1, Xi+1, j, k-1, Xi, j-1, k-1, Xi, j+1, k-1

i ) x-direction, j ) y-direction, and k ) z-direction.

for the migration rates based on the number of nearest neighbors, and important associated parameters such as prefactors needed for using this model, can be found in refs 48 and 49. The structure used here is a cubic SOS model. These migration rates increase with increasing temperature. Deposition is modeled on a three-dimensional (3-D) lattice with periodic conditions to prevent edge effects. The length and width dimensions of the lattice used is 100 × 100, which is large enough to reduce noise to a workable level, but small enough such that mesoscale simulations are not cost-prohibitive. To ensure that the size was sufficiently large, we compared step densities with those predicted from identical operating conditions for larger lattices (250 × 250) and found that the values obtained for a 100 × 100 lattice were within the expected range. Some simplifications in the model can be made based on properties of the system. Temporal step density, for example, reaches a stationary state value quite rapidly during the deposition of one species (usually within one or two monolayers). Therefore, regardless of the actual thickness of each layer in a Bragg reflector, the average step density will be the same as long as the above requirement is met. As a result, simulating deposition for an extended length of time is not required to obtain the necessary mesoscale outputs, particularly since average thickness can be calculated macroscopically. Since, in our work, the focus was interfacial step density, without loss of generality, the deposition time was kept constant at 5 s. This value can be changed without any effect on the presented methodology, or it can become a controlled variable when thickness is a process objective. In addition, the step density of a GaAs layer or an AlAs layer is independent of the number of previous layers that have been deposited. This reduces the number of layers required to accurately describe the deposition to three: a GaAs layer, an AlAs layer, then another GaAs layer. (Note that we are not concerned about the surface conditions for this particular formulation; if we were, the final layer might need to be AlAs, thereby possibly requiring four layers to accurately model deposition.) Note that, because of the interchangeability of spatial and temporal step density, in terms of how it affects temperature, annealing time, and the equilibrium properties discussed above, these simplifications can easily be applied to the objective function, which uses spatial step density. These simplifications may reduce computation time by an order of magnitude, but significant further reductions are still required for the problem to be computationally feasible. This further reduction in computational expense is accomplished by employing SISAT. Optimization Problem. Our optimization objective for this multiscale deposition process is to minimize macroscale thickness nonuniformities and mesoscale interfacial step densities, to maximize the number of functional devices. It is also desired to operate at lower temperatures for lower energy costs, and to minimize time with no metallorganic flux (both at the surface and at the inlet), to enable greater production of theoretical units and reduce the consumption of carrier gases. These objectives conflict, because higher temperatures result in improved thickness uniformity, and both higher temperatures and annealing times result in smoother interfacial surfaces. Step density is more insensitive to flux rates, although a positive correlation does exist between the two.

Table 4. Starting Conditions (Scaled) type low medium high

temperature annealing time Ga inlet velocity Al inlet velocity 0.3 (775 K) 0.5 (825 K) 0.7 (875 K)

0.5 s 1s 2s

0.3 (0.31 m/s) 0.5 (0.45 m/s) 0.7 (0.59 m/s)

0.3 (0.31 m/s) 0.5 (0.45 m/s) 0.7 (0.59 m/s)

We investigated several different parameters for the search algorithm in our optimization problem. Two different SISAT perturbation sizes, to examine the effect of local versus longerdistance sensitivity information, and three different starting conditions are listed in Table 4 to check for consistency. We are also interested in optimizing both for annealing time (no metallorganic flux at the wafer surface) and “off time” (no metallorganic flux at the inlet). “Off time” is defined as the sum of “ramp down time” (the time when the metallorganic flux at the inlet is zero and the rate at the wafer surface is decreasing due to transport effects) and annealing time. We approximate “off time” based on solutions to time-dependent simulations at various velocity intervals and wafer positions. Finally, we examined four different traditional optimization algorithms: the gradient-based methods of steepest descent and BFGS, and the pattern search algorithms of Nelder-Mead and Hooke-Jeeves. For pattern search algorithms, the optimization problem was formulated as an interior point one. The optimization objective was to minimize a cost function J that included the process condition constraints as penalty functions. During the search, the multiscale process simulation was considered a black-box function and was accounted for indirectly in the formulations since the values of the process conditions in J are the simulation results. The two cost functions used to minimize annealing time only and “off time,” respectively, are the following: To minimize annealing time: J ) T2 + A2 + 10(UG2 + UA2) + (SA/G2 + SG/A2) + 200

∑P

i

(12)

i

To minimize off time: J ) T2 + F2 + 10(UG2 + UA2) + (SA/G2 + SG/A2) + 200

∑P

i

(13)

i

where Ts is the scaled temperature; A is the annealing time (in seconds); F is the scaled “off time”; UG and UA are the surface uniformities of GaAs and AlAs, respectively; SA/G and SG/A are the step densities between a lower Ga layer and higher Al layer, and a lower Al layer and higher Ga layer, respectively; and Pi is a linear penalty for violating constraints. Surface nonuniformities were calculated from the square root of the sum of the square errors of each position from the mean height. Each variable is scaled between 0 and 1: this scaling is done to improve the effectiveness of the optimization search algorithm. Penalties are applied if the temperature falls below 700 K or exceeds 950 K, if annealing time is negative, or if the inlet velocities do not stay within a range of 0.1-0.8 m/s (since we are using unconstrained optimization algorithms, we could not program these as hard constraints). The decision variables are temperature, annealing time (“ramp-down time” is determined from the processing conditions), inlet velocity when Ga is being deposited, and inlet velocity when Al is being deposited. The multiscale simulations were treated as a black box. The optimization problem was solved using MATLAB. Because of the stochastic nature of this system, small changes in function values may be hidden by system noise. As a result, we do not expect global convergence to a single point, but rather a range of almost optimal solutions to this problem. The value used for

Ind. Eng. Chem. Res., Vol. 49, No. 17, 2010

Figure 3. GaAs uniformity profiles for starting conditions (dashed line) and optimal conditions (solid line), using the annealing-time-only optimization formulation.

7897

Figure 4. AlAs uniformity profiles for starting conditions (dashed line) and optimal conditions (solid line), using the annealing-time-only optimization formulation.

εtol was initially 0.03, then was increased to 0.05 for later Hooke-Jeeves trials. Results and Discussion We obtained optimal operating conditions for formulations based on annealing time only and based on “off-time”, using the four methods outlined above. Both gradient-based methodss steepest descent and BFGSsfailed, even when we employed lower optimization cost function weights and penalties that sacrificed improved numerical round-off errors for reasonable step sizes. Steepest descent was not reaching local optimality, converging at the boundaries of the feasibility regions and freezing them. BFGS failed because the approximate Hessian computation resulted in dramatically exceeding constraints, leading to simulation failure. Generally, using the Nelder-Mead method gave “optimal” results, but these results showed fair to poor consistency and difficulties were encountered in separating from the initial conditions. This is most likely because of stochastic effects, which resulted in the algorithm eliminating the worst points that are not necessarily in the intended search direction. Furthermore, both methods are allowed to cross into the infeasible region during the search, which, in this case, lead to simulation issues. The results from using the Hooke-Jeeves method were mostly consistent; therefore, we can conclude that pattern search algorithms are appropriate for the specific type of optimization problems, yielding consistent results. Optimal temperatures typically fell between ∼850 K and 900 K. Annealing time was reduced toward zero for annealing-timeonly formulations and usually was