Large-Scale Parallel Computation of Incompressible Flows by a

Jul 1, 2010 - this work, our goal is to use a domain decomposition approach to parallelize the serial algorithm without much modifications of the seri...
0 downloads 0 Views 397KB Size
8080

Ind. Eng. Chem. Res. 2010, 49, 8080–8085

Large-Scale Parallel Computation of Incompressible Flows by a Domain Decomposition-Based Least-Squares Finite Element Method Xu Ding,† Q. Y. Jiang,‡ and Tate T. H. Tsang†,* Department of Chemical and Materials Engineering, UniVersity of Kentucky, Lexington, Kentucky 40506 and Center for Computational Sciences, UniVersity of Kentucky, Lexington, Kentucky 40506

We have successfully developed a least-squares finite element method (LSFEM) for unsteady-state, threedimensional convection dominated flows. The serial algorithm was used for simulations on a single CPU. In this work, our goal is to use a domain decomposition approach to parallelize the serial algorithm without much modifications of the serial code. We illustrate the efficiency of the parallel algorithm by flow simulations for more than 25 million unknowns. Introduction For almost two centuries, the famous Navier-Stokes equations have had a prominent role in fluid mechanics and transport phenomena.1 The equations admit closed-form solutions only for simple flows, and numerical solutions may not be available for difficult cases, because of the highly nonlinear convection terms and complex flow system geometry. Much progress has been made in the last three decades, and it now becomes a routine exercise to use commercial software to simulate twodimensional flows. However, unsteady-state and three-dimensional large scale flow problems remain a challenging task. In this work, we present a domain decomposition based leastsquares finite element method to simulate flow problems with millions of unknown variables. For such large flow problems, parallel computation is the only viable approach. We have successfully developed a least-squares finite element method (LSFEM) for different unsteady-state, three-dimensional flows.2-6 These results are due to a serial algorithm used for simulations on a single CPU. Here, our goal is to parallelize the serial algorithm using a domain decomposition approach. We illustrate the efficiency of the parallel algorithm by flow simulations for more than 25 million unknowns. The development of numerical solutions for the Navier-Stokes Equations follows that of partial differential equations. Finite difference methods for structured and uniform grid systems dominated the early development,7 and finite-volume methods have become more popular in recent years.8 Highly accurate numerical solutions can be obtained, with much computational expenses but less in practice, by spectral methods or spectral elements methods.9,10 For most engineering simulations, the finite-element method11,12 finds numerous applications, because of its flexibility to deal with complex system geometry, its potential to obtain higher-order accurate numerical results, and its rigorous mathematical theory.13,14 Finite element methods can be classified as a weighed residual method.15 The formal definition of the method is as follows:

∫ W(r)R(r) dr ) 0 Ω

(1)

where W is the weighting function, R the residual of the differential equations, and dr the differential element of the * To whom correspondence should be addressed. E-mail: tsang@ engr.uky.edu. † Department of Chemical and Materials Engineering. ‡ Center for Computational Sciences.

system domain Ω. If the weighting function is the same as the finite element interpolation function, the method is referred to as the Galerkin finite-element method, which is most effective for partial differential equations with Laplacian terms for viscous diffusion, conduction, and diffusion. However, convectiondominated fluid flow problems present a challenging task for its numerical solutions. The Galerkin finite-element method will lead to spurious or oscillating solution, which is a numerical artifact. To reduce numerical oscillations, an upwinding approach7 or the use of different weighting functions was commonly adopted. The use of different weighting functions other than the finite element interpolation function leads to Petrov-Galerkin finite-element methods.16,17 Early upwinding approaches7 led to significant numerical diffusion, which is another numerical artifact created by applying the upwinding method to the convection terms. Gresho and Lee18 pointed out the pitfalls of such excessively smooth numerical results. Suffice it to say that it remains a challenging task to develop a numerical method that has neither numerical oscillations nor numerical diffusion for three-dimensional convection-dominated flows. Theories and practice of finite element methods for fluid flow problems have been addressed in detail.19-21 In summary, it is desirable to use a method that reduces the amount of numerical oscillations and numerical diffusion. In this work, we use a leastsquares finite-element method (LSFEM) to accomplish such a purpose. We should mention that the idea of using a least-squares approach for first-order partial differential equations is not new. Hafez and Phillips22 used a modified least-squares formulation for two first-order partial differential equations. They proved that minimization of a least-squares functional leads to two second-order Poisson equations with auxiliary variables, which were subsequently solved by standard second-order accurate central finite difference approximations. However, recent development of LSFEM began with the work of Carey and Jiang.23-25 They formulated the LSFEM for first-order hyperbolic equations, including the compressible Euler equations. Jiang26 summarized his pioneering work in theoretical development and applications of the least-squares finite element method. It is important to point out that there are numerous LSFEM formulations, and two recent developments for improving LSFEM are noteworthy. First, Lee et al.27 and Heys et al.28,29 developed methods to improve mass conservation of LSFEM, which may become an acute problem at the center of the flow domains, as evidenced in their case studies. Second, Prabhakar

10.1021/ie100136b  2010 American Chemical Society Published on Web 07/01/2010

Ind. Eng. Chem. Res., Vol. 49, No. 17, 2010 30,31

and Reddy developed space-time spectral/hp penalty leastsquares finite element methods for steady and unsteady flows. A Least-Squares Finite Element Method for Incompressible Flows. In this work, we choose the following velocityvorticity-pressure formulation of the Navier-Stokes equations (in repeated indices form): ∂ui ∂ui 1 ∂ωk ∂P + uj + + εijk )0 ∂t ∂xj ∂xi Re ∂xj

(2)

∂uj )0 ∂xj

(3)

∂uk ωi - εijk )0 ∂xj

(4)

∆t˜

+

(5)

∂u(n+1,m+1) j )0 ∂xj ω(n+1,m+1) - εijk i

∂u(n+1,m+1) k )0 ∂xj

∂ω(n+1,m+1) j ∂xj

)0

[

∂ui ui 1 ∂ωk ∂P - θ˜ uj + + εijk ∂xj ∂xi Re ∂xj ∆t˜

]}

(n)

(11)

Equations 6-9 can be written in the following compact form

{R} ) {Lv}(n+1,m+1) - {f}(n) - {g}(n+1,m)

(12)

(6)

(7)

(8)

(9)

where superscripts n and m represent the nth time level and the mth Newton linearization step, respectively. In eq 6, gi(n+1,m) and f i(n) are defined as

(13)

where L is an operator matrix, {v} ) {u, V, w, P, ωx, ωy, ωz}T is the vector of unknown variables at a finite-element node. u, V, and w are the velocities in the x-, y-, and z-directions, respectively, and ωx, ωy, and ωz are the vorticity components in the x-, y-, and z-direction, respectively. The vector {v} is approximated by the finite-element interpolation function Φ(x): {v}(n+1,m+1) ) {Φ(x)}T · {v(t)}

(14)

The least-squares functional F of the residual R is defined as follows: F({v}(n+1,m+1)) )

∫ {R}

T



· {R} dΩ

(15)

Minimization of the above functional leads to the first-order LSFEM:

∑ ∫Ω [L{Φ}] e

∂u(n+1,m+1) i u(n+1,m) j ∂xj

∂P(n+1,m+1) + ∂xi

∂u(n+1,m) i + u(n+1,m+1) + j ∂xj (n+1,m+1) 1 ∂ωk εijk ) g(n+1,m) + fi(n) i Re ∂xj

{

with the residual vector (R) being defined as

Ding and Tsang6 analyzed the elliptic properties of the above firstorder least-squares finite-element method and concluded that it is uniformly elliptic. Equations 2-5 are integrated in time by the Crank-Nicolson scheme [θ ) 1/2, ∆t˜ ) θ∆t, and, θ˜ ) (1 - θ)/θ], and linearized by Newton’s method to give u(n+1,m+1) i

f (n) i )

8081

(10)

{Lv}(n+1,m+1) ) {f}(n) + {g}(n+1,m)

where ui and ωi are the fluid velocity and vorticity components in the ith direction, Re is the Reynolds number, P is the modified pressure, and εijk is the permutation symbol. Ding and Tsang6 compared velocity-vorticity-pressure, velocity-stress-pressure, and velocity-velocity gradient-pressure formulations of the firstorder least-squares finite-element method for unsteady-state, threedimensional flows and found that the above velocity-vorticity-pressure formulation is efficient, accurate, and robust with the least computer memory usage. The above first-order system of equations forms the residual vector R of the partial differential equations and does not require the usual integration by parts used by the Galerkin finite-element method. Furthermore, it allows the use of linear finite-element interpolation functions for all the unknown variables. There are seven equations and unknown variables in this formulation. Jiang32 pointed out that the above first-order equations with an odd number of unknowns cannot form an elliptic system. Therefore, the following compatibility constraint for vorticity (kinematics constraint: the divergence of the curl of a vector quantity is equal to zero) is added in practice. ∂ωj )0 ∂xj

g(n+1,m) ) i

∂u(n+1,m) i u(n+1,m) j ∂xj

e

T

· [L{Φ}] dΩ{v} )

∑∫ e

Ωe

[L{Φ}]T · {c} dΩ

(16) where {c} ) {f}(n) + {g}(n+1,m), and the operator matrix [L{Φ}] at the ith node of an element is

[

[L{Φ}i] ) ∂Φi ∂u ∂u 1 ∂Φi 1 ∂Φi dui Φ Φ 0 ∂y i ∂z i ∂x Re ∂z Re ∂y ∂Φi ∂Φi ∂V ∂V 1 1 ∂Φi dV Φ Φ 0 i ∂x i ∂z i ∂y Re ∂z Re ∂x ∂Φi ∂w ∂w 1 ∂Φi 1 ∂Φi dwi Φ Φ 0 ∂x i ∂y i ∂z Re ∂y Re ∂x ∂Φi ∂Φi ∂Φi 0 0 0 0 ∂x ∂y ∂z ∂Φi ∂Φi Φi 0 0 0 0 ∂z ∂y ∂Φi ∂Φi Φi 0 0 0 0 ∂z ∂x ∂Φi ∂Φi Φi 0 0 0 0 ∂y ∂x ∂Φi ∂Φi ∂Φi 0 0 0 0 ∂x ∂y ∂z

In eq 17,

]

(17)

8082

Ind. Eng. Chem. Res., Vol. 49, No. 17, 2010

dΦi )

dui ) dΦi + Φi

∂u ∂x

(18a)

dVi ) dΦi + Φi

∂V ∂y

(18b)

dwi ) dΦi + Φi

∂w ∂z

(18c)

∂Φi ∂Φi ∂Φi Φi +u +V +w ˜ ∂x ∂y ∂z ∆t

(18d)

We should note that eq 16 is a symmetric, positive definite (SPD) linear system of equations for each element. Assembly of eq 16 over all elements leads to the following global SPD linear system of equations: Ax ) b

Figure 1. Structure of the serial LSFEM-JCG algorithm.

(19)

where A, x, and b are the global matrix, unknown vector, and right-hand-side vector, respectively. The importance of this result cannot be overstated. The least-squares finite-element method remains the only method that leads to SPD linear systems for convection-dominated flow problems. This advantage becomes even more significant if we must perform large-scale convectiondominated flow simulations with millions of unknown variables. It also offsets the disadvantage of using the above first-order equations formulation, which introduces additional unknown variables, such as ωi, the vorticity vector. The sparse linear systems that resulted from applying different discretization methods to partial differential equations are solved by either direct methods33 (using a Gaussian elimination and factorization approach) or iterative (indirect) methods.34 For very large linear systems, iterative methods are preferable, because they have a lesser memory storage requirement and do not suffer from round-off errors as direct methods do. If the linear system is SPD, simple preconditioned conjugate gradient methods prove to be robust and converge to accurate numerical solutions. If the linear system is unsymmetric, iterative methods, such as the Generalized Minimal Residual method (GMRES) and Bi-Conjugate Gradient methods (Bi-CG) can be used, but the choice of preconditioner becomes a critical issue for the convergence of these iterative methods. This fact further implies that the SPD linear system derived from the LSFEM is a desirable result. In practice, it is not necessary to assemble the global SPD matrix A in eq 19. Carey and Jiang35,36 developed a matrixfree element-by-element preconditioned conjugate gradient method for the least-squares finite-element system of equations. In this work, we use the matrix-free element-by-element approach for eq 19 and the inverse of the main diagonal of A as the preconditioner, which is the well-known Jacobi conjugate gradient method (JCG). We should mention that only five global vectors are required for storage for the JCG, and this small memory requirement is the reason why the least-squares finite-element Jacobi conjugate gradient method (LSFEM-JCG) can simulate very-large-scale fluid-flow problems and compare favorably with the Galerkin finite-element method.37 The serial LSFEM-JCG algorithm is illustrated in Figure 1, where J, m, and n are the iterative levels for the Jacobi Conjugate Gradient method, the Newton’s step of linearization, and the nth time level, respectively. A Domain Decomposition Strategy for Parallel Computations. The philosophy of domain decomposition is to decompose the entire computational domain into subdomains and

Figure 2. Different domain partitioning schemes.

to assign a processor to perform computational tasks in each subdomain. Such “divide and conquer” strategy may prove to be efficient if there is proper load balancing and minimum data communication overhead. Here, load balancing implies that each processor has approximately the same computational tasks, and minimum data communication suggests that we have maximum data locality and minimum frequency of data transfer between processors. The first step in domain decomposition is to partition the physical domain into subdomains. The domain partitioning can be either overlapping or nonoverlapping. In our early attempts, we tried to implement an alternating Schwarz method38 on overlapping subdomains. However, we found that much more computing time was needed, and it was difficult to maintain load balancing among processors, even though we used an equal number of finite elements and nodes in each subdomain. In this work, we use nonoverlapping subdomains. Figure 2 illustrates one-, two-, and three-dimensional domain partitioning schemes for a rectangular domain. In principle, these schemes apply to domains of any shape. Each subdomain is individually meshed and has its own ordering of elements and nodes. The nodes on the interfaces shared by neighboring subdomains should be matched. For loadbalancing purposes, the size of subdomains should be adjusted so that each subdomain has approximately the same number of elements and nodes. For one-dimensional partitioning scheme, each processor must exchange data with two

Ind. Eng. Chem. Res., Vol. 49, No. 17, 2010

8083

a

Table 1. CPU Times, Speedups, and Efficiencies, Based on the Number of CPUs

CPU Time (s)

a

number of CPUs

Case 1

Case 2

Case 3

Case 4

Case 5

1 2 4 8 16

1516(1.00/100) 917(1.65/83) 441(3.43/86) 217(6.98/87) 119(12.8/80)

4838(1.00/100) 3088(1.57/78) 1600(3.02/75) 837(5.78/72) 453(10.7/67)

12926(1.00/100) 7954(1.63/81) 4148(3.11/78) 2176(5.94/74) 1225(10.6/66)

17356(1.00/100) 10857(1.6/80) 5665(3.06/77) 3049(5.69/71) 1642(10.6/66)

79193(1.00/100) 50346(1.57/78) 25866(3.06/77) 13798(5.74/72) 7604(10.4/65)

The speedup and the efficiency (in percentage) values are given in parentheses.

neighboring processors, and for three-dimensional partitioning, each processor will exchange data with six processors. One can easily imagine the situation that one may partition the domains in ways such that each processor can exchange data with many other neighbors. However, the data communication time for such numerous exchanges will increase the computing time for the entire flow simulation. For the sake of simplicity in programming and to reduce data communication at the interfaces, we chose the onedimensional partitioning scheme. Furthermore, one-dimensional partitioning scheme allows us to extend our serial code with minimum modifications. Each processor stores all the element and node data for its subdomain. In addition, data of the nodes at the interfaces shared by adjacent domains are stored, and data structures are used to send and receive information for the nodes at each interface. OpenMPI 1.2 is the message passing interface software used for data communication at the interfaces. The numerical results from the parallel algorithm are exactly the same as those from the original serial code. Therefore, this simple nonoverlapping domain decomposition approach allows us to extend our serial code, with minimum modifications, to parallel computation of large-scale flow problems. Results and Discussion We implement the domain-decomposition-based leastsquares finite-element method using the one-dimensional partitioning scheme as shown in Figure 2 and apply the parallel algorithm to three-dimensional lid-driven cavity flows with Reynolds numbers of Re ) 400 and 1000. The lid-driven cavity flow has been used as a benchmark problem to test numerical methods and to investigate the physics of threedimensional recirculating flows. An interesting question is whether Taylor-Gortler-like vortices, which indicate that the flow becomes unstable and does not have a steady-state condition, exist at Re ) 1000 and 2000. Ku et al.,39 Guj and Stella,40 and Fujima et al.41 reported that the flow is stationary at Re ) 1000, whereas Fujima et al.41 obtained Taylor-Gortlerlike vortices at Re ) 2000. Koseff and Street42-44 conducted a series of experiments for lid-driven cavity flows and reported that the flow was not stable at Re ) 2000 but showed Taylor-Gortler-like vortices at Re ) 3200. These studies suggest that it is important to perform large-scale simulations to ascertain the numerical results for high-Reynolds-number flows. Ding and Tsang6 provided detailed results for Re ) 2000 and showed the unstable flow behavior with the existence of Taylor-Gortler-like vortices. For lid-driven cavity flow at Re ) 3200, Ding and Tsang5 used the Large Eddy Simulation (LES) technique and the LSFEM to simulate the low-Reynolds-number turbulent-flow behavior. Their numerical results compared well with the experimental data by Koseff and Street.42-44 In this work, linear interpolation function is used for each unknown variable, and impulsively started simulations with

a dimensionless time step of 0.5 are conducted until each flow reaches its steady-state condition. Because of the flow symmetry, all the computations are performed on the half domain (1.0 × 1.0 × 0.5 for the x × y × z coordinate system). No-slip boundary conditions are used along the walls of the cavity. The dimensionless velocity of the lid is u(x, 1, z) ) 1. We use structured finite-element mesh systems for our simulations. First, a uniform mesh is generated for the computational domain. Then, three layers of elements are used to refine the elements along the walls, to accurately capture the flow near the walls. Numerical results and convergence studies for these Reynolds numbers using different finite-element mesh systems were reported in Ding and Tsang6 and will not be repeated here. For the lid-driven cavity flow of Re ) 400, we have performed three case studies, and the simulations are terminated at the dimensionless time level of 40. Structured finite-element meshes of 64 × 64 × 32 (131 072 elements and 975 975 unknowns), 96 × 96 × 48 (442 368 elements and 3 227 287 unknowns) and 128 × 128 × 64 (1 048 576 elements and 7 571 655 unknowns) are used for Case 1, Case 2, and Case 3, respectively. Simulations for higher-Reynoldsnumber flow (Re ) 1000) are terminated at the dimensionless time level of 50, and structured finite-element meshes of 128 × 128 × 64 (1 048 576 elements and 7 571 655 unknowns) and 192 × 192 × 96 (3 538 944 elements and 25 292 071 unknowns) are used for Case 4 and Case 5, respectively. Simulations are carried out on an IBM Intel EM64T Linux Cluster with two dual-core Intel Xeon 5160 CPUs (3.0 GHz) per blade. IB SDX 4X is the interconnect between blades,

Figure 3. Speedup performance for Cases 1-5.

8084

Ind. Eng. Chem. Res., Vol. 49, No. 17, 2010

Table 2. CPU Times, Speedups, and Efficiencies, Based on the Number of Bladesa CPU Time (s)

a

number of blades

Case 1

Case 2

Case 3

Case 4

Case 5

1 2 4 8

917(1.00/100) 441(2.08/104) 217(4.22/106) 119(7.71/96)

3088(1.00/100) 1600(1.93/97) 837(3.69/92) 453(6.82/85)

7954(1.00/100) 4148(1.92/96) 2176(3.66/91) 1225(6.49/81)

10857(1.00/100) 5665(1.92/96) 3049(3.56/89) 1642(6.61/83)

50346(1.00/100) 25866(1.95/97) 13798(3.65/91) 7604(6.62/83)

The speedup and the efficiency (in percentage) values are given in parentheses.

and each blade has an 8 GB computer memory. Intel Fortran 10.1.015 is the compiler used for all parallel computations. The traditional definitions of speedup and efficiency, based on one CPU per blade (or node), are given as follows: Sp(N) ) Eff(N) (%) )

Eff(M) (%) )

Acknowledgment This work was supported by the U.S. Environmental Protection Agency.

CPU(1) CPU(N)

(20a)

Sp(N) × 100 N

(20b)

Literature Cited

where N is the number of processors (nodes), and CPU(1) and CPU(N) are the CPU times on 1 and N processors, respectively. Sp(N) and Eff(N) are, respectively, the speedup and efficiency of parallel computations using N processors. Table 1 shows the CPU times (in seconds), speedup, and efficiency for Cases 1-5. The speedup and the efficiency (as a percentage) are given in parentheses. It appears that the performance for Case 3 is similar to Case 2, even though the number of unknowns for Case 3 is more than doubled. Furthermore, the speedup and efficiency for Cases 4 and 5 are similar to those for Cases 2 and 3. Figure 3 shows the speedup for each case study. Case 4 is not shown in the plot, because it is almost the same as Case 5. Each parallel computation has achieved linear scalability, thus allowing simple and reliable prediction of the parallel algorithm for larger-scale computations. It is worth noting that, for each case study, the CPU times are nearly halved as the number of CPUs doubles. Since each blade of the IBM Intel EM64T Linux cluster contains two CPUs, and the interconnects are for data communication between blades, it is reasonable to recalculate the speedup and efficiency based on the number of blades instead of the number of CPUs. Then, eqs 20a and 20b become Sp(M) )

approach for large-scale computational fluid dynamics problems.

Blade(1) Blade(M)

(21a)

Sp(M) × 100 M

(21b)

where M is the number of blades, and Blade(1) and Blade(M) are the CPU times on one and M blades, respectively. Table 2 shows the CPU times, speedup (Sp(M)), and efficiency (Eff(M)), based on the number of blades (using eqs 21a and 21b). Note that the speedup/efficiency results in Table 2 are presented in a manner similar to those for using clusters with N nodes, having 1 CPU per node. Furthermore, for Case 1, the simulation shows superlinear speedup, similar to our previous experience on using clusters with single core or single CPU for each node. In conclusion, the domain decomposition-based leastsquares finite-element method is a simple and yet effective

(1) Bird, R. B.; Stewart, W. E.; Lightfoot, E. N. Transport Phenomena, 2nd ed.; John Wiley & Sons: New York, 2002. (2) Tang, L. Q.; Tsang, T. H. A Least-Squares Finite Element Method for Time-Dependent Incompressible Flows with Thermal Convection. Int. J. Numer. Methods Fluids 1993, 17, 271–289. (3) Tang, L. Q.; Cheng, T.; Tsang, T. H. Transient Solutions for ThreeDimensional Lid-Driven Cavity Flows by a Least-Squares Finite Element Method. Int. J. Numer. Methods Fluids 1995, 21, 413–432. (4) Tang, L. Q.; Tsang, T. H. Temporal, Spatial and Thermal Features of 3D Rayleigh-Benard Convection by a Least-Squares Finite Element Method. Comput. Methods Appl. Mech. Eng. 1997, 140, 201–219. (5) Ding, X.; Tsang, T. H. Large Eddy Simulation of Turbulent Flows by a Least-Squares Finite Element Method. Int. J. Numer. Methods Fluids 2001, 37, 297–319. (6) Ding, X.; Tsang, T. H. On First-Order Formulations of the LeastSquares Finite Element Method for Incompressible Flows. Int. J. Comput. Fluid Dynamics 2003, 17, 183–197. (7) Roache, P. Computational Fluid Dynamics; Hermosa Publishers: Albuquerque, NM, 1972. (8) Chung, T. J. Computational Fluid Dynamics; Cambridge University Press: Cambridge, U.K., 2002. (9) Canuto, C.; Hussaini, M. Y.; Quarteroni, A.; Zang, T. A. Spectral Methods in Fluid Dynamics; Springer-Verlag: New York, 1988. (10) Karniadakis, G.; Sherwin, S. Spectral/hp Element Methods for CFD; Oxford University Press: New York, 1999. (11) Zienkiewicz, O. C.; Cheung, Y. K. The Finite Element Method in Structural and Continuum Mechanics: Numerical Solution of Problems in Structural and Continuum Mechanics; McGraw-Hill: New York, 1967. (12) Hughes, T. J. R. The Finite Element Method; Prentice-Hall: Englewood Cliffs, NJ, 1987. (13) Carey, G. F.; Oden, J. T. Finite Elements: A Second Course; Prentice-Hall: Englewood Cliffs, NJ, 1983. (14) Oden, J. T.; Carey, G. F. Finite Elements: Mathematical Aspects; Prentice-Hall: Englewood Cliffs, NJ, 1983. (15) Finlayson, B. A. The Method of Weighted Residuals and Variational Principles; Academic Press: New York, 1972. (16) Brooks, A. N.; Hughes, T. J. R. Streamline Upwind/Petrov-Galerkin Formulations for Convection Dominated Flows with Particular Emphasis on the Incompressible Navier-Stokes Equations. Comput. Methods Appl. Mech. Eng. 1982, 32, 199–259. (17) Hughes, T. J. R.; Franca, L. P.; Hulbert, G. M. A New Finite Element Formulation for Computational Fluid Dynamics. VIII. The Galerkin/Least-Squares Method for Advective-Diffusive Equations. Comput. Methods Appl. Mech. Eng. 1989, 73, 173–189. (18) Gresho, P. M.; Lee, R. L. Don’t Suppress the WigglessThey’re Telling You Something. Comput. Fluids 1981, 9, 223–253. (19) Carey, G. F.; Oden, J. T. Finite Elements: Fluid Mechanics; Prentice-Hall: Englewood Cliffs, NJ, 1986. (20) Gresho, P. M.; Sani, R. L. Incompressible Flow and the Finite Element Method; John Wiley & Sons: New York, 1998. (21) Zienkiewicz, O. C.; Taylor, R. L. The Finite Element Method: Fluid Dynamics, 5th ed.; Butterworth-Heinemann: Oxford, U.K., 2000; Vol. 3. (22) Hafez, M. M.; Phillips, T. N. A Modified Least Squares Formulation for a System of First-Order Equations. Appl. Numer. Math. 1985, 1, 339– 347.

Ind. Eng. Chem. Res., Vol. 49, No. 17, 2010 (23) Carey, G. F.; Jiang, B. N. Least-Squares Finite Elements for FirstOrder Hyperbolic Systems. Int. J. Numer. Methods Eng. 1988, 26, 81–93. (24) Jiang, B. N.; Carey, G. F. A Stable Least-Squares Finite Element Method for Nonlinear Hyperbolic Problems. Int. J. Numer. Methods Fluids 1988, 8, 933–942. (25) Jiang, B. N.; Carey, G. F. Least-squares Finite Element Method for Compressible Euler Equations. Int. J. Numer. Methods Fluids 1990, 10, 557–568. (26) Jiang, B. N. The Least-Squares Finite Element Method; SpringerVerlag: Berlin, Germany, 1998. (27) Lee, E.; Manteuffel, T. A.; Westphal, C. R. Weighted-Norm FirstOrder System Least Squares (FOSLS) for Problems with Corner Singularities. SIAM J. Numer. Anal. 2006, 44, 1974–1996. (28) Heys, J. J.; Lee, E.; Manteuffel, T. A.; McCormick, S. F. On MassConserving Least Squares Methods. SIAM J. Sci. Comput. 2006, 28, 1675– 1693. (29) Heys, J. J.; Lee, E.; Manteuffel, T. A.; McCormick, S. F.; Ruge, J. W. Enhanced Mass Conservation in Least-Squares Methods for NavierStokes Equations. SIAM J. Sci. Comput. 2009, 31, 2303–2321. (30) Prabhakar, V.; Reddy, J. N. Spectral/hp Penalty Least-Squares Finite Element Formulation for the Steady Incompressible Navier-Stokes Equations. J. Comput. Phys. 2006, 215, 274–297. (31) Prabhakar, V.; Reddy, J. N. Spectral/hp Penalty Least-Squares Finite Element Formulation for Unsteady Incompressible Flows. Int. J. Num. Methods Fluids 2008, 58, 287–306. (32) Jiang, B. N.; Lin, C. Y.; Povinelli, L. A. Large-Scale Computation of Incompressible Viscous Flow by Least-Squares Finite Element Method. Comput. Methods Appl. Mech. Eng. 1994, 114, 213–231. (33) Davis, T. A. Direct Methods for Sparse Linear Systems; SIAM: Philadelphia, 2006. (34) Saad, Y. IteratiVe Methods for Sparse linear Systems, 2nd ed.; SIAM: Philadelphia, 2003.

8085

(35) Carey, G. F.; Jiang, B. N. Element-by-Element Linear and Nonlinear Solution Schemes. Commun. Appl. Numer. Methods 1986, 2, 145–153. (36) Carey, G. F.; Jiang, B. N. Least-Squares Finite Elements and Preconditioned Conjugate Gradient Solution. Int. J. Num. Methods Eng. 1987, 24, 1283–1296. (37) Tang, L. Q.; Tsang, T. H. An Efficient Least-Squares Finite Element Method for Incompressible Flows and Transport Processes. Int. J. Comput. Fluid Dynamics 1995, 4, 21–39. (38) Smith, B. F.; Bjorstad, P. E.; Gropp, W. D. Domain Decomposition: Parallel MultileVel Methods for Elliptic Partial Differential Equations; Cambridge University Press: New York, 1996. (39) Ku, H. C.; Hirsh, R. S.; Taylor, T. D. A Pseudospectral Method for Solution of the Three-Dimensional Incompressible Navier-Stokes Equations. J. Comput. Phys. 1987, 70, 439–462. (40) Guj, G.; Stella, F. A Vorticity-Velocity Method for the Numerical Solution of 3D Incompressible Flows. J. Comput. Phys. 1993, 106, 286– 298. (41) Fujima, S.; Tabata, M.; Fukasawa, Y. Extension to ThreeDimensional Problems of the Upwind Finite Element Scheme Based on the Choice of Up- and Down-wind Points. Comput. Methods Appl. Mech. Eng. 1994, 112, 109–131. (42) Koseff, J. R.; Street, R. L. Visualization Studies of a Shear Driven Three-Dimensional Recirculation Flow. J. Fluids Eng. 1984, 106, 21–29. (43) Koseff, J. R.; Street, R. L. On End Wall Effects in a Lid-driven Cavity Flow. J. Fluids Eng. 1984, 106, 385–389. (44) Koseff, J. R.; Street, R. L. The Lid-driven Cavity Flow: A Synthesis of Qualitative and Quantitative Observations. J. Fluids Eng. 1984, 106, 390–398.

ReceiVed for reView January 20, 2010 ReVised manuscript receiVed May 28, 2010 Accepted June 1, 2010 IE100136B