Supercomputing strategies for the design and ... - ACS Publications

A Multifrontal Approach for Simulating Equilibrium-Stage Processes on Supercomputers. Jayarama U. Mallya, Mark A. Stadtherr. Industrial & Engineering ...
0 downloads 0 Views 3MB Size
604

Ind. Eng. Chem. Res. 1993, 32, 604-612

PROCESS DESIGN AND CONTROL Supercomputing Strategies for the Design and Analysis of Complex Separation Systems Stephen E. Zitneyt and Mark A. Stadtherr; Department of Chemical Engineering, University of Illinois, 1209 West California Street, Urbana, Illinois 61801

The solution of large, sparse, linear equation systems is a critical computational step in the analysis and design of interlinked separation columns. If the modeling equations for a separation system are grouped by equilibrium stage, the linear systems take on an almost-block-tridiagonalform. We study here the use of the frontal approach to solve these linear systems on supercomputers. The frontal approach is potentially attractive because it exploits vector processing architectures by treating parts of the sparse matrix as full submatrices, thereby allowing arithmetic operations to be performed with easily vectorizable full-matrix code. The performance of the frontal method for different matrix orderings and different numbers of components is considered. Nine interlinked distillation systems are used as test problems. Results indicate that the frontal approach provides substantial savings in computation time, an order of magnitude in some cases, compared to traditional sparse matrix techniques.

Introduction An important problem in chemical process analysis and design is the simulation of separation systems. We concentrate here on systems of complexly interlinked equilibrium stage operations, in particular systems involving recycle streams between columns. While such systems may be solved in a sequential-modular fashion, repetitively simulating one column at a time until recycle streams are converged, it is widely recognized (e.g., Hofeling and Seader (1978), Hsu and Tierney (1981), Stadtherr and Malachowski (1982)) that such systems can be handled more efficiently by solving all columns simultaneously using an equation-based approach analogous to the well-known Naphtali-Sandholm (1971) method for single columns. In this case,the set of nonlinear equations describing the system is usually solved using the Newton-Raphson method or some related technique requiring the solution of a series of linear subproblems. The solution of these large, sparse systems of linear equations often accounts for a large fraction of the total solution time, and is the problem addressed here. Today there is increasing interest in the use of supercomputers, and other machines using vector processing computer architectures, for the solution of process simulation problems (e.g., Haley and Sarma (19891,Harrison (1989)). However, for the solution of sparse linear systems of the sort arising here, it is well-known (e.g., Duff and Reid (1982)) that conventional sparse matrix solvers developed for use on serial machines do not vectorize well, and thus do not take good advantage of the power of the vector processing architecture. One promising strategy (Vegeaisand Stadtherr, 1990;Zitney and Stadtherr, 1992) for taking better advantage of vector processing in the context of process simulation is the frontal method. We study here the performance of this method, relative to the

* Author to whom correspondence should be addressed. + Present address: Cray Research, Inc., 6553 Lone Oak Dr., Eagan, MN 55121.

0888-5885/93/2632-0604$04.00/0

conventional approach, on sparse matrices arising in the simulation of complexly interlinked separation systems. Since the performance of the frontal method depends strongly on the initial equation ordering in the sparse matrix, this issue will be an important consideration.

Background Problem Formulation. Consider the Naphtali-Sandholm (1971) model for an equilibrium stage operation involving Nstages and C Components. In this formulation, each stage (including condensers and reboilers) is represented by an energy balance 0 = (1 + S J H ,

+ (1 + S n ) h n - H,+, - hn-l - h , ,

(1)

C material balances

o = (1 + 8,)Vn,m + (1 + sn)Ln,m - Vn+l,m - Ln-1,m - f n , m (2)

and C equilibrium relationships 0 = (qnKn,mVnLn,m)lLn- Vn,m + (1 - qn)Vn+1,mVn/Vn+1

(3) where n = 1, 2, ..., N a n d m = 1, 2, ..., C. Nomenclature is as indicated in the generalized plate (stage) diagram in Figure 1. H, and h, represent the enthalpy flows from plate n for the vapor and liquid respectively; hf,,is the enthalpy feed rate for plate n and can also be used to represent a tray heat load or loss; qn is the Murphree efficiency of plate n based on the vapor phase. The form of the equilibrium relationships arises directly from the definitions of q n and the equilibrium vaporizationconstants Kn,,,,. This provides a total of N(2C + 1) equations. The N(2C 1) variables are taken to be the N plate temperatures T,and the 2NC vapor and liquid component flow rates Vn,mandL,,,. This assumes that all the feed streams are completely specified, as are the heat loads for all stages (including partial condenser and partial reboiler), the

+

0 1993 American Chemical Society

Ind. Eng. Chem. Res., Vol. 32,No. 4,1993 605

A

I

p

p

l

a

f

e

fjl

n-l

, ,

4

7

4

152

1

8

1

Figure 1. Generalized stage diagram.

. . .

. . . ’

-1

AN

Figure 2. Block tridiagonalform (BTDF) for single-columnproblem with N plates.

B,

-

Cl

A2 Bz

cz

A3 B3

sidestream ratios S, and s,, and the column pressure. If other top and bottom specifications are used, or if a total condenser or total reboiler is used, these equations can be easily modified as described by Naphtali and Sandholm. For a single column, when the equations are grouped by plate and ordered from plate 1 toN, the coefficient matrix for the sparse linear equation system that must be solved takes on a block tridiagonal form (BTDF) as shown in Figure 2. The block matrices A,, B,, and C, represent the partial derivatives of the equations for the nth plate with respect to the variables on plates n - 1, n, and n + 1,respectively. The overall matrix can be viewed as an N X N block matrix where the elements themselves are (2C 1) X (2C 1). Linear systems of this sort can be solved efficiently using special-purpose Gaussian elimination techniques such as the block-Thomas algorithm (Henley and Seader, 1981). For multiple columns, the coefficientmatrix will contain nonzero blocks off the block tridiagonal. These off-BTD blocks occur because the interconnections between columns result in stages that no longer interact solely with the stages immediately before and after them in the plate ordering. As an example, consider the interconnected system (Hofeling and Seader, 1978)shown in Figure 3. If the plates are ordered as indicated, the coefficient matrix takes on the almost-BTDF shown in Figure 4. The offBTD blocks above the BTD significantly increase the computational difficulty of solvingthe linear subproblems in this case, since elimination operations cause fill-in in the area between the off-BTD block and the BTD (Hofeling and Seader, 1978). Various special-purpose solution techniques for these linear systems have been suggested, including a modified block-Thomas algorithm (Hofeling and Seader, 1978)and a method based on reordering to bordered-BTDF (Stadtherr and Malachowski, 1982).

+

c3

A4 B4 c4

‘45

B5

c4,12

c5 B6 c6

A7 B7 c7 A8 B8 C8

As

B9 c9

Am B1o CUI All Bll A1Z,4

c11.16

BIZ CIZ B13 ‘13

B14 ‘14 B15 ‘15

+

A16,11

B16 ‘16 All

‘17

AI8 B18 CIS

-

B19

606 Ind. Eng. Chem. Res., Vol. 32, No. 4, 1993

Stadtherr and Wood, 1984b). In a recent comparative study, Kaijaluoto et al. (1989) found that LUlSOL outperformed six other general-purpose sparse solvers, including the well-known MA28 from the Harwell Subroutine Library, in terms of reliability and efficiency. Their set of 12 test problems included two distillation systems, a "Petlyuk towers" system with 22 plates and 3 components and a de-ethanizer system with 20 plates and 10 components. LUlSOL proved to be the most efficient code on these two systems. These general-purpose sparse solvers were developed for serial machines and make considerable use of indirect addressing techniques. Unfortunately, this represents a serious drawback for the implementation of such codes on vector processing machines (e.g., Duff and Reid (1982)). While the use of hardware gatherlscatter to enhance the vectorization of codes using indirect addressing can improve their performance in some cases (Lewis and Simon, 1988),and is used in connection with LUlSOL for the tests reported here, it is not always particularly effective. Recoding of the algorithms used in these generalpurpose routines is also of limited benefit. Thus the use of different solution algorithms must be considered. The frontal method is one such possibility. Frontal Method. The frontal approach is based on Gaussian elimination and was originally developed as a banded matrix solver for finite element problems (Irons, 1970;Hood, 1976). The potential for exploiting the frontal approach on vector computers was apparently first recognized by Duff (1979). A well-known implementation of the frontal method for vector computers is the code MA32 from the Harwell Subroutine Library (Duff, 1984; Duff and Reid, 1982). The MA32 routine is designed primarily for the out-of-coresolution of sparse unsymmetric matrices and, in general, is structured for finite element problems, though it can be applied to others. Cameron (1977) first proposed the idea of applying the frontal approach to solve mass balances for interconnected chemical processes. This work was done on a sequential mainframe computer, without any intended application to vector computers. Stadtherr and Vegeais (1985) first suggested using the frontal approach to solve process simulation problems using advanced computer architectures, and subsequently demonstrated its potential (Vegeais and Stadtherr, 1990). Most recently, Zitney and Stadtherr (1993) described implementations of the frontal method developed specifically for use in the context of process simulation. One of these codes, FAlP, is utilized here. A detailed description of the algorithm used by FAlP, including discussionsof the pivoting strategies used for maintaining numerical stability and of the relationship to the frontal method for finite element problems, is given by Zitney and Stadtherr (1993). The basic idea in the frontal approach is to restrict elimination operations to a series of "frontal matrices". The solution proceeds by alternating between an assembly phase, in which equations and variables are added to the frontal matrix, and an elimination phase, in which Gaussian elimination is performed to remove equations and variables from the frontal matrix. The front can be though of as a window advancing down the diagonal of the sparse matrix during the solution, and within which all operations occur. By treating these frontal matrices as full, the problem of indirect addressing can be overcome, and the computations can be effectively vectorized. Simple examples demonstrating the use of the frontal method can be found in Stadtherr and Vegeais (1985) and Zitney and Stadtherr (1993). This earlier work has indicated that there is a strong relationship between the size of the frontal matrices and the performance of the frontal method

F1,

Figure 5. Flowsheet for test problems 1 and 4. Condensers and reboilers not shown.

F1,

F2, ,F3

5 S1 I

B2,

Figure 6. Flowsheet for teat problems 2,3, and 7. Condensers and reboilers not shown.

on process simulation problems. To a large extent, this is due to a trade-off between the computational rate (high for full-matrix calculations) and the number of wasted operations on the zeros that tend to occur in the frontal matrices in this context. Thus, in general, it is desirable to keep the size of the frontal matrices as small as possible. We apply here the one-pivot frontal code, FAlP (Zitney and Stadtherr, 1993),to the matrices arising in the analysis and design of interlinked equilibrium-stage separation systems. As a first step, we describe a suite of nine test problems to be used. Next, frontal matrix size, and the effect of different equation ordering5 on it, is considered. Numerical results indicating the performance of the frontal method are then presented. Test Problems Nine test problems varying in flowsheet configuration, total number of stages, and number of components are used here. Diagrams of the flowsheeta are given in Figures 5-10. Table I shows the number of stages, the number of components, and the resulting number of equations for each problem. Detailed problem specifications and final solutions can be found in Malachowski (1983). These nine interlinked systems provide a variety of problem types to serve as a basis for the comparison of the linear equation solving algorithms studied here.

Ind. Eng. Chem. Res., Vol. 32, No. 4,1993 607

D2

D3

r'r'

b

Figure 7. Flowsheet for test problem 5. Condensers and reboilers not shown.

5

A

F1

r

I

I

w

L s3

Figure 8. Flowsheet for test problem 6. Condensers and reboilers not shown.

D2

7

Figure 10. Flowsheet for test problem 9. Condensers and reboilera not shown.

acetone, ethanol, and water. The two-column system is configured in the same way for both problems 2 and 3. They differ only in the number of stages. Problems 4 and 5 involve small interlinked systemsthat are used to separate a large number of components. Problem 4 is a complex, doubly-interlinked system of two columns similar to the Petlyuk system. The columns process a 10-component mixture consisting of the normal hydrocarbons from n-butane to n-dodecane, plus isopentane. Problem 5 is a singly-interlinked system of two columns. The 25 total stages are used to separate 15 components, the 10 used in problem 4, plus propane, 1-pentene, cis-2-pentene, trans-2-pentene, and 1-hexene. Problem 6 involves the separation of a six-component mixture consisting of normal hydrocarbons from n-butane to n-heptane, as well as isobutane and isopentane. The flowsheet is the configuration of three interlinked columns described by Westerberg et al. (1979). Problem 7 also involves the processing of a sixcomponent mixture, this time benzene, toluene, ethylbenzene, p-xylene, rn-xylene, and o-xylene. The two interlinked columns have the same configuration as those in problems 2 and 3. Problems 8 and 9 are systemscontaining a large number of stages and only three components. Problem 8 is a doubly-interlinked system of two columns used to separate o-, p-, and rn-xylene. Problem 9 is comprised of three singly-interlinkedcolumns used to process 1-pentene, cis2-pentene, and trans-2-pentene. Frontal Matrix Size and Row Ordering

b

B1

B2-

Figure 9. Flowsheet for test problem 8. Condensers and reboilers not shown.

Problem 1represents a so-called Petlyuk towers system (Petlyuk et al., 1965). T w o interlinked columns are used in this problem to separate three components, n-hexane, n-octane, and n-decane. The problem specifications are similar to those used by Hutchison and Shewchuk (1974). Problems 2 and 3 are examples of interlinked systems of extractive distillation columns (Holland, 1981), used here to separate a four-componentmixture of methanol,

The success of the frontal approach depends largely on the front remaining small since this will tend to lower the number of overall arithmetic operations in the elimination phase. The maximum size that the frontal matrix can reach is a good indicator of the overall number of operations performed. The size of the frontal matrix is determined by the ordering of the equations (rows) in the overall coefficient matrix. Without agood row ordering the frontal method may not be effective. Thus,we examine here the effect of equation ordering on the maximum front size required when solving interlinked distillation systems. Equations, or blocks of equations, are assembled into the frontal matrix beginning with those at the top of the overall matrix and proceeding to the bottom. When equations enter the front, all the variables they contain enter too. Whenever the entry of an equation into the frontal matrix causes one or more variables to become fully summed (Le., to have all its nonzero elements within the front), elimination operations can be performed within

608 Ind. Eng. Chem. Res., Vol. 32, No. 4, 1993 Table I. Test Problem Statistics and Frontal Matrix Sizes problem no. 1

a

no. of eqs 154 405 810 777 775 1235 1560 2100 2310

structurea a

no. of stages 22 45 90 37 25 95 120 300 330

no. of comps 3 4 4 10 15 6 6 3 3

mas frontal matrix size (eqs X vars) r-BLOKS HCGS 11 X 23 11 X 26 11 X 25 51 X 108 13 X 37 14 X 29 14 X 29 96 X 108 15 X 38 265 X 537 41 X 82 32 X 65 47 x 95 306 X 609 48 X 126 20 X 41 291 X 588 23 X 54 366 X 738 23 X 54 20 X 41 15 X 31 15 X 34 16 X 38 11 X 23 10 X 26 11 X 26

r-P4

Refers to Figure 12. This is the structure based on the HCGS reordering.

V V V T L L LIV V V T L L LIV V V T L L L

x x x x x x x x x x x X

X X

X X

X

x_ _ _ _ _ _ _ x_ _ _ _ _ _ _ _ _ _ x

equation ordering

.__________________

x x x x x x x x x x x x x x x x x x x x x

A

B

mas frontal matrix size (eqs X vars)

equation ordering

mas frontal matrix size (eqs X van)

(I), (2), (3) (2C + 1) x (4C + 3) (3), (l),(2) (2C + 2) x (4C + 3) (l),(3), (2) (2C + 1) x (4C + 3) (3,(3), (1) (3C + 2) x (4C + 3) (2), (I), (3) (2C + 2) x (4C + 3) (3), (3,(1) (3C + 2) x (4C + 3)

C

Figure 11. Equation-variablestructure of the A, B, and C blocks for a simple plate and a three-component system. Equations are ordered in the sequence energy balance, material balances, equilibrium relations.

the frontal matrix to eliminate those variables. The equations containing the pivot elementsfor the elimination can then be removed from the front, more equations and variables assembled into it, etc. We now track this procedure for some potentially attractive matrix orderings to determine the maximum size the frontal matrix can reach. Almost-BTDF Orderings. To begin with consider the case of a single column, and consider the natural BTDF block structure for the coefficient matrix shown in Figure 2. Let F,,and X,, refer to the vectors of equations and variables, respectively, corresponding to the nth plate. Applying the frontal method, the blocks B1 and C1 (i.e., the equations F1 and the variables XI and XZ)would first enter the frontal matrix. At this point none of the variables are fully summed, so Az,B2, and C2 would enter the front, which now has a size 2 X 3. Now the variables X1 are fully summed and can be eliminated by using pivots from F1 or Fz. Assuming pivots from F1 are used, blocks B1, C1, and A2 then leave the front. Next blocks As, B3, and C3 are added, etc. It can easily be seen that this will not increase the size of frontal matrix beyond 2 X 3. Thus, since each block is of order 2C + 1,the maximum frontal matrix size is 2(2C + 1)X 3(2C + 1) or (4C + 2) X (6C + 3). It can be seen that the maximum front size is directly related to the number of components and does not depend on the number of plates in the column. In this case equations and variables entered the front a block at a time; however, the nonzero blocks are not full so the actual maximum front size, when equations are assembled one at a time, will be smaller than (4C + 2) X (6C + 3). The actual front size on an equation-variable level depends on the ordering of eqs 1-3 within the blocks. Figure 11shows the subblock structure for a simple plate and a three-component system (C = 3), with the equation groups ordered (11, (2), (3). Table IIlists the maximum front size for the six possible orderings of these equation groups. The two orderings with the energy balance first, including the ordering depicted in Figure 11,can be seen to be the best for the application of the frontal method, yielding a maximum front size of (2C 1)X (4C + 3). Note that all of the orderings give a maximum front size which is much

+

Table 11. Maximum Frontal Matrix Size for Various Orderings of Subblock Equations in Naphtali-Sandholm Formulation

smaller than the (4C + 2) X (6C + 3) predicted by looking only at the block matrix. Consider now the problem of multiple interlinked columns, using the almost-BTDF block matrix shown in Figure 4. Proceeding as before, and assuming that pivots from F1, F2, and F3 have been chosen, we see that after the equationsF6enter the front, the frontal matrix has reached the size 3 X 5 , and includes equations F4,Fb,and F6 and variables X4, Xg, X6, X7, and XIZ. Clearly the frontal matrix is larger than in the BTDF case because of the off-BTD blocks A12,4 and C4,lZ. In this case the other offBTD blocks A16,ll and cll,l6 do not contribute to a further increase in the maximum frontal matrix size, since they enter the frontal matrix as the contributions of A12.4 and C4,12 leave. Thus, for this problem the maximum front size in block terms is 3 X 5. Note however that had the block Cll,16been further (vertically)off the BTD the frontal matrix would have become larger. In general, to keep the size of the frontal matrix small, it is desirable to keep the number of off-BTD blocks and their distance from the BTD at a minimum. These are properties determined by the sequencing of the different column sections (i.e., the numbering of the plates) in the coefficient matrix (Hidalgo et al., 1980). The plate numbering in Figure 3 and the corresponding row ordering in Figure 4 were obtained using the column section ordering: 2a, 2b, 3a, 1,3b. Other orderings will produce matrices with varying numbers of off-BTD blocks at varying positions. Hidalgo et al. (1980) have presented a search strategy for minimizing the number of off-BTD blocks and their distance from the diagonal, since that is desirable for applying the solution algorithm of Hofeling and Seader (1978). Since a strategy for maintaining a small front size must satisfy the same two objectives, the Hidalgo-Correa-Gomez-Seader (HCGS) reordering is attractive for the purposes here. It should be noted that, for applying the frontal method, obtaining an absolute minimum in the distance of off-BTD blocks from the diagonal may not always be crucial. For instance, in the example here, if the block c11.16 were closer (vertically) to the diagonal, it would not lower the maximum front size, though it would lower the number of operations somewhat by keeping the front size a t 2 X 3 longer. Thus, in some situations the search in the HCGS method may be stopped prior to reaching the absolute optimum without serious effect. When HCGS is applied to the test problems here, the absolute optimum is used in all cases except problem

Ind. Eng. Chem. Res., Vol. 32, No. 4, 1993 609

Figure 12. Schematic representation of the structure of the block occurrence matrices for the test problems. See Table I.

6, for which a near optimum is used without effect on the maximum front size. On an equation-variable level, the off-BTD blocks A and C are (C + 1) X (C + 11, as shown in Figure 5. Therefore, in this example, A12.4 and C4,12 add a total of C + 1 equations and 2(C + 1) variables to the frontal matrix. The maximum front size for the almost-BTDF in Figure 4 is obtained by starting with the maximum front size for a BTDF, which is (2C + 1)X (4C + 31, and adding the extra C + 1equations and 2C 2 variables, yielding a maximum front size of (3C + 2) X (6C + 5). The maximum front size for each of the nine distillation test problems is given in Table I for the HCGS equation ordering. The almost-BTDF structures resulting from the HCGS ordering are indicated in this table and illustrated in Figure 12, where an X denotes a nonzero off-BTD submatrix. The maximum front size for all the problems, except problem 8, is (3C + 2) X (6C + 51, as in the previous example. The complex doubly-linked arrangement in problem 8 has C + 1additional equations (one extra block equation) and 2C + 2 additional variables (two extra block variables) in the largest frontal matrix. Note that if the rightmost and bottommost off-BTD nonzero blocks in Figure 12e had been sufficiently close (vertically and horizontally, respectively) to the BTD, as in Figure 12f, these additional contributions to the frontal matrix would have been avoided. The ordering considered here can be thought of as an attempt to keep the matrix structure as close as possible to the desirable BTDF arising in a single-column system. Another desirable matrix form for the frontal method is the upper triangular form. We now consider reorderings based on it.

+

Almost-TriangularOrderings. The upper triangular form is desirable from the standpoint of the frontal method because as each row enters the frontal matrix, a variable is immediately available for elimination. As a result, the maximum front size is only 1 X N, where N is the order of the matrix. Systems that are almost upper triangular will have some elements below the diagonal. The columns containing these elements are often referred to as spike columns. Various general-purpose algorithms, several of which are described by Stadtherr and Wood (1984a), are available for reordering a matrix into an almost-lowertriangular form with relatively few spike columns. To obtain an almost-upper-triangular ordering for use with the frontal method, the row orderings found using these algorithms can simply be reversed, as suggested by Vegeais and Stadtherr (1990). For an almost-upper-triangular reordering the maximum front size is (MLSPK + 1)X N , where MLSPK is the maximum number of local spikes, a term defined by Stadtherr and Wood (1984a) that is related to the proximity to the diagonal of the nonzeros in a spike. If the triangular portion is sparse, which is the case here, the maximum front size can be much smaller than (MLSPK + I) X N . We focus here on two methods for obtaining such a reordering, the partitioned preassigned pivot procedure (P4) of Hellerman and Rarick (1972) and the BLOKS scheme of Stadtherr and Wood (1984a). These methods are designed to produce, by row and column permutations, an almost-lower-triangular reordering of the original matrix. They are intended to reduce fill-in when used in connection with general sparse matrix solvers. To apply these in connection with the frontal method, the row orderings found are reversed and we denote the reordering schemes as r-P4 and r-BLOKS. The P4 reordering method does not specificallylook for any block structure that may be inherent in a matrix. Instead, it treats the entire matrix on the individual equation-variable level. Thus it may not identify desirable block structure in the matrix and may in fact actually destroy desirable block structure originally present. This may result in a poor reordering. BLOKS is a general-purpose reordering scheme developed by Stadtherr and Wood (1984a) to directly exploit block structures originally present in a coefficient matrix. In applying this method here, the block structure of the matrix is first identified from the structure of the interlinked separation system. This block matrix is then reordered by the SPK2 method (Stadtherr and Wood, 1984a). Next, the equations and variables within these blocks are reordered, again by SPK2. Thus, BLOKS explicitly exploits block structure by reordering first on the block level, and then preserving the desirable block structure while operating on the equation-variable level. The results of Stadtherr and Wood show that this strategy generally produces fewer local spikes than P4. Thus, the r-BLOKS scheme should prove to be more effective than r-P4 in maintaining a small frontal matrix. The maximum front size for each of the nine distillation test problems is given in Table I for the r-P4 and r-BLOKS reorderings. This shows that r-P4 yields a relatively large size for the frontal matrix in six of the nine problems. The r-BLOKSordering provides a consistently small maximum front size, competitive with the HCGS ordering. Results are presented below showing how the results in terms of maximum frontal matrix size translate into actual computational times. Numerical Results Since it is desired to compare the frontal method with the best method available for serial machines, we first

610 Ind. Eng. Chem. Res., Vol. 32, No. 4, 1993 0.12 0.51

0.61

0.88

1.29

r+4

1 r-BLOKS I HCGS

1

2

3

4

5

6

7

8

9

9 Problem Number

Problem Nvmber

Figure 13. Comparisonof linear equation solverson aserialmachine (CDC Cyber 175). The time to solve one linear system is shown.

consider the performance of different linear equation solvers on a serial machine (Malachowski, 1983). Two general-purpose sparse solvers are used LUlSOL, which was discussed above, and MA28, from the Harwell Subroutine Library. In addition, two special-purpose codes for solving almost-BTDF matrices are used: THOMAS, which implements the modified block-Thomas algorithm of Hofeling and Seader (1978). and BBTDF2, which implements the algorithm of Stadtherr and Malachowski (1982) for solving a system in bordered-BTIIF. For THOMAS and RBTDF2 an almost-BTDF matrix structure is needed, so the HCGS reordering is used. For LUlSOL the BLOKS reordering is used. MA28 implements a one-pass approach to solving the sparse matrix, so there is no a priori reordering; the matrix is in effect reorderedduring thesolution process using the Markowitz (1957)sparse pivoting technique. These runs were made by Malachowski (1983) on a CDC Cyber 175 mainframe. Results for the nine test problems are shown in Figure 13. The relatively poor performance of MA28 is not surprising because by usingaone-passapproachit is unable to take advantage of the desirable block structure inherent in the matrices. The other techniques take advantage of thisstructure. The general-purpose solver LUlSOL with the BLOKS reordering is clearly the most effective on these problems, and will thus be the basis for assessing the performanceofthefrontalmethod. Itshould benoted that LUlSOL was also tested using other a priori reorderings, including HCCS, and that BLOKS provided the best results overall. We now turn our attention to vector computers and the use of the frontal method. While the frontalmethod would not be attractive for these problems on serial computers, since it performs many unnecessary operations on zeros inthiacontext,itisattraccivefromthestandpointofvector computers, since it readily vectorizes and thus provides a computational rate potentially high enough to offset the unnecessary operations performed. Reordering Strategies. We first consider the effect of row ordering on the performance of the frontal method. The reorderings used are the almostBTDF HCGS reordering, and the almost-upper-triangular reorderings r-P4 and r-BLOKS. The frontal method was implemented using the code FAlP (Zitney and Stadtherr, 1993). All runs were made on one processor of a Cray Y-MP supercomputer. Results for the nine test problems are shown in Figure 14. The solid bars for the r-P4 scheme illuswate how a poor ordering can drastically degrade the overall performance ofthe frontal approach. It isevident that the HCGS

Figure 14. Comparison ofrwrderingmethodsudwiththe frontal solver FAlP on a Cray Y-MP.The time to solve one linear system is shown.

and r-BLOKS strategies are both effective on these problems, with HCGS the best overall by a very small margin. For these problems the cost to implement the reorderings is insignificant, averaging less than 2% of the total simulation time. For problems with larger numbers of off-BTD blocks, the r-BLOKS reordering may be cheaper to implement than HCGS since it does not involve searching for an optimum. Useof HCGS withasuboptimal search may also be appropriate. For problems in which a complex separation system is integrated with other flowsheet units, the factthat r-BLOKS isageneral-purpose strategy makes it more attractive. In fact, the results of Vegeais and Stadtherr (1990) indicate that the r-BLOKS reordering should be effective on flowsheeting problems in general. Comparing the computational results in Figure 14 with the results in Table I for maximum frontal matrix size shows the clear correlation between the two. Different matrix reorderings for the frontal method can be cheaply evaluated by determining the maximum frontal matrix size. F A l P v s LUlSOL. Wenowcomparethe performance on a vector computer of the frontal method, as implemented in FAlP, with the best serial method, as implemented in LUlSOL. Thebestreorderings for eachmethod were used HCGS for FAlP and BLOKS for LUlSOL. AU runs were made on one processor of a Cray Y-MP. The runswith LUlSOL were performed using hardwaregather/ scatter to enhance its vectorization. Results for the nine test problems are shown in Figure 15. The relative speedup of FAlP over LUlSOL is shown in Figure 16. It can be seen that FAlP is at least twice as fast as LUlSOL and an order of magnitude faster in two cases. Theaveragespeedupon these problemsis about 5.

It is clear that LUlSOL performs relatively poorly even when hardwaregather/scatter isavailable. Theavailability of this hardware facility does not guarantee the efficient performance of a general sparse code with indirect addressing. The poor performance here is due in part to the extremely sparse matrices arising from interlinked distillation problems. The small number of nonzeros per row results in very short vector lengths that do not overcome the startup time for loops with indirect addressing. The FAlP code is more effective than LUlSOL because it uses full-matrix code for Gaussian elimination on the frontal matrix. The arithmetic operations are performed ondatastored in contiguously or regularly indexedvectora,

Ind. Fng. Chem. Res., Vol. 32, No. 4,1993 611 0.6

I

L

2

0.5

0.4

I

I LUlSOL

-

12

-0

-

0FAlP

-

0.3

2

8

2

-

.A

Y

K

3

0

2

FAlP

0

m

E

10

0,

0.2

-

0.1

-

YI

v)

7 4 4 0

w

2

&Ill1 1

0

1

2

3

4

5

6

7

8

9

0

Problem Number

1

Fwure 15. Comparisonofastandardsparsematrixd e (LUlSOL) and a d e using the frontal method (FAlP) on a Cray Y-MP. The time to solve one linear system ia shown. 12

I

t

I Number of Components

Figure 16. Relative speedup of frontal solver (FAIP) over the standard method (LUISOL) on a Cray Y-MP.

and this is a desirable feature for codes run on vector computers. The high computation rate so obtained is clearly sufficient to overcome the need for unnecessary operations on zeros. Figure 16showsthespeedupplottedagainstthenumber of components in each problem. It is interesting to note that there is a correlation between the speedup of the frontalsolver over LUlSOLand thenumber of components in the distillation system. This happens largely because the density of the nonzero blocks increases as the number of components increases. Thus, the number of wasted operations on zeros tends to decrease as the number of components increases. It is also interesting to note that, when FAlP is applied with a poor row ordering, as shown in Figure 14 for the r-P4 reordering on problems 2-7, it may be actually be considerablylessefficientthanLU1SOL. Thisemphasizea again the importance of using a good row ordering in connection with the frontal approach. Fortunately, the I-BLOKS and HCGS reordering strategies provide such an ordering. Total Solution Time. The solution of sparse linear equation systems represents one subproblem in the simulation of a complex separation system. Thus the speedup in overall solution time provided by the use of the frontal method will be less than the speedup on the linear systems alone. The overall speedup will depend on the fraction of overall computation time originally spent in the sparse solver used for comparison, LUlSOL in this case. To obtain total computation times using LUlSOL and

2

3

4

5

6

7

8

9

Problem N u m b e r

Figure 17. Total solution times for the test problems. The bottom section of each bar indicates the time spent in the linear equation solver. The upper section indicates the time spent in all other routines.

FAlP as sparse solvers, the nine test problems were solved usingthequasi-NewtonmethodofSchubert (1970)tosolve the nonlinear equation system (1)-(3). For problems 2 and 3, thermodynamic properties were calculated using the UNIQUAC subroutines of Prausnitz et al. (1980). For other problems, the Chao-Seader equation of state was used. Partial derivatives of K values and enthalpies with respect to temperature and composition were determined numerically by finite difference. Analytic expressions for partial derivatives were used otherwise. Results for the nine test problems showing the total time broken into time spent in the linear system solver and time spent in other routines, primarily those for physical property evaluation, are shown in Figure 17. Except for problem 4, it can be seen that originally, using LUlSOL, roughly 2C-40% of the total time is spent in the linear solver, with an average of about 30%. Thus, the overall performance enhancement is less dramatic than that seen on the sparse linear solver alone. Savings are still quite substantial however, ranging from about 10 to 80%. Process features that leadtoincreasedcomputation time in the linear equation solver, and thus to a stronger incentive for using the frontal approach, include a large number of components, multiple interlinks, and topologies with "long" recycle streams, for which off-BTDblocks far from the BTD cannot he avoided. Problem 4 exhibits all these features. It should be noted that in our experience with the simulation of separation operations the fraction of time spent in the linear equation solver can be much higher than in the test problems used here. For example, in the solution of four reactive distillation problems using ASPEN PLUS, Zitney (1990) found that over 90% of the computation time was spent in the linear equation solver. Similarly, in thedynamicsimulation of separation systems usingSPEEDUP,Zitneyetal. (1992) foundthatonseveral problems over 80% of the computation was due to the linear equation solver. Concluding Remarks The frontal method, when used in connection with a row ordering that takes advantage of the block structure of the problem, is an extremely effective strategy for use on vector computers in solving the sparse linear systems that occur in the simulation of complex separation systems. Such systemswere considered on a stand-alone basis here. In flowsheets encompassing such systems, the corresponding calculations are usually a large fraction of the

612 Ind. Eng. Chem. Res., Vol. 32, No. 4, 1993

flowsheet simulation problem as a whole. Thus similar results should be obtained when using an equation-based approach on the overall flowsheet. A module employing the frontal approach to perform these calculations should also be very attractive from the standpoint of sequentialmodular flowsheeting. When used in connection with techniques, such as the widely-used inside-out approach, for doing physical property calculations more efficiently, the effect of using the frontal method should be especially significant. The frontal method is very effectiveon vector computers because it operates on fullsubmatrices. It should be noted that it is not the only strategy that could be used on these problems to do this. For example, the modified blockThomas algorithm of Hofeling and Seader (1978) and the bordered-BTDF algorithm of Stadtherr and Malachowski (1982) can both be implemented to operate on full submatrices. These represent two potential alternatives to the frontal approach.

Acknowledgment This work was supported in part by the National Science Foundation under Grant DDM-9024946. We also acknowledge the support of the National Center for Supercomputing Applications a t the University of Illinois and the support of Cray Research, Inc. Nomenclature A, = matrix of partial derivatives BF,/BX,-l representing liquid flow from stage n - 1 to stage n A,,p = matrix of partial derivatives BF,/BX, representing liquid flow from stage p to stage n B, = matrix of partial derivatives BF,/BX, C = number of components C, = matrix of partial derivatives BF,IBX,+l representing vapor flow from stage n + 1 to stage n C,,, = matrix of partialderivativesBFJBX, representing vapor flow from stage p to stage n f n , m = molar feed rate of component m to stage n F, = vector of equations representing stage n h, = enthalpy flow rate from stage n due to liquid hf,,= enthalpy flow rate to stage n due to feed; also heat load or loss for tray n H, = enthalpy flow rate from stage n due to vapor Kn,, = equilibrium vaporization constant for component m on stage n L, = total liquid molar flow rate from stage n L , , = liquid molar flow rate of component m from stage n N = number of stages s, = liquid sidestream split ratio from stage n S, = vapor sidestream split ratio from stage n T, = temperature on stage n V , = total vapor molar flow rate from stage n V,,,m = vapor molar flow rate of component m from stage n X, = vector of variables describing stage n 7, = Murphree efficiency of stage n based on vapor phase Literature Cited Cameron, I. T. Mass Balances for Process Design by the Frontal Solution Method. M.S. Thesis, University of Washington, 1977. Chen, H. S.; Stadtherr, M. A. On Solving Large Sparse Nonlinear Equation Systems. Comput. Chem. Eng. 1984, 8, 1. Duff, I. S. Recent Developments in the Solution of Large Sparse Linear Equations. Presented a t IRIA Fourth International Symposium on Computing Methods in Applied Sciences and Engineering, Versailles, Dec 10-14, 1979. Duff, I. S. Design Features of a Frontal Code for Solving Sparse Unsymmetric Linear Systems Out-of-Core. SZAM J. Sci. Stat. Comput. 1984, 5, 270. Duff, I. S.; Reid, J. K. Experience of Sparse Matrix Codes on the Cray-1. Comput. Phys. Commun. 1982,26, 293.

Haley, J. C.; Sama, P. V. L. N. Process Simulation on Supercomputers. Chem. Eng. h o g . 1989,85 (lo), 28-32. Harrison, B. K. Performance of a Process Flowsheeting System on a Supercomputer. Comput. Chem. Eng. 1989, 13,855-857. Hellerman, E.; Rarick, D. The Partitioned Preassigned Pivot Procedure (P4). InSparse Matrices and TheirApplications;Rose, D. J., Willoughby, R. A., Eds.; Plenum Press: New York, 1972; p 76. Henley, E. J.; Seader, J. D. Equilibrium-StageSeparationOperations in Chemical Engineering;Wiley: New York, 1981; pp 600402. Hidalgo, R., Correa, S. A.; Gomez, V. M.; Seader, J. D. An Optimal Arrangement of Simultaneous Linearized Equations for General Systems of Interlinked, Multistaged Separators. AZChE J. 1980, 26, 585. Hofeliig, B. S.;Seader, J. D. A Modified Naphtali-Sandholm Method for General Systems of Interlinked, Multistaged Separators. AZChE J. 1978, 24, 1131. Holland, C. D. Fundamentals of Multicomponent Distillation; McGraw-Hill: New York, 1981; p 232. Hood, P. Frontal Solution Program for Unsymmetric Matrices. Znt. J. Numer. Methods Eng. 1976,10, 379. Hsu, H.; Tierney, J. W. Exact Calculation Methods for Systems of Interlinked Columns, Part 1: Separation of Binary Heterogeneous Azeotropes. AZChE J. 1981,27,733. Hutchison, H. P.; Shewchuk, C. F. A Computational Method for Multiple Distillation Towers. Trans. Znst. Chem. Eng. 1974,52, 325. Irons, B. M. AFrontal Solution Program for Finite Element Analysis. Znt. J. Numer. Methods Eng. 1970,2, 5. Kaijaluoto, S.; Neittaanmiki, P.; Ruhtila, J. Comparison of Different Solution Algorithms for Sparse Linear Equations Arising from Flowsheeting Problems. Comput. Chem.Eng. 1989,13,433-439. Lewis, J. G.; Simon, H. D. The Impact of Hardware GatheriScatter on Sparse Gaussian Elimination. SZAM J. Sci. Stat. Comput. 1988, 9, 304. Malachowski, M. A. Numerical Methods for the Simultaneous Solution of Systems of Interlinked Distillation Columns. Ph.D. Thesis, University of Illinois, 1983. Markowitz, H. M. The Elimination Form of the Inverse and its Application to Linear Programming. Manage. Sci. 1957,3, 255. Naphtali, L. M.; Sandholm, D. P. Multicomponent Separation Calculations by Linearization. AZChE J. 1971, 17, 148. Petlyuk, F. B.; Platanov, V. M.; Slavinskii, D. M. Thermodynamically Optimal Method for Separating Multicomponent Mixtures. Znst. Chem. Eng. 1965,5, 555. Prausnitz, J. M.; Anderson, T. F.; Grens, E. A.; Eckert, C. A,; Hsieh, R.; OConnell, J. P. Computer Calculations for Multicomponent Vapor-Liquidand Liquid-Liquid Equilibria;Prentice-Hall: Englewood Cliffs, NJ, 1980. Schubert, L. K. Modification of a Quasi-Newton Method for Nonlinear Equations with a Sparse Jacobian. Math. Comput. 1970,25,27. Stadtherr, M. A.; Malachowski, M. A. On Efficient Solution of Complex Systems of Interlinked Multistage Separators. Comput. Chem. Eng. 1982,6,121. Stadtherr, M. A,; Wood, E. S. Sparse Matrix Methods for EquationBased Chemical Process Flowsheeting: I. Reordering Phase. Comput. Chem. Eng. 1984a, 8,9. Stadtherr, M. A,; Wood, E. S. Sparse Matrix Methods for EquationBased Chemical Process Flowsheeting: 11. Numerical Phase. Comput. Chem. Eng. 1984b, 8, 19. Stadtherr, M. A.; Vegeais, J. A. Process Flowsheeting on Supercomputers. Znst. Chem. Eng. Symp. Ser. 1985, 92, 67. Vegeais, J. A.; Stadtherr, M. A. Vector Processing Strategies for Chemical Process Flowsheeting. AIChE J. 1990,36,1687-1696. Westerberg, A. W.; Hutchison,H. P.; Motard, R. L.; Winter, P.Process Flowsheeting;Cambridge University Press: Cambridge, 1979;pp 231-232. Zitney, S. E. A Frontal Code for ASPEN PLUS on Advanced Architecture Computers. Presented a t AIChE Annual Meeting, Chicago, Nov 12-16, 1990; paper no. 74a. Zitney, S. E.; Stadtherr, M. A. Frontal Algorithms for EquationBased Chemical Process Flowsheeting on Vector and Parallel Computers. Comput. Chem. Eng. 1993, in press. Zitney, S. E.; Amarger, R. J.; Winter, P. The Impact of Supercomputing on Dynamic Simulation Using the SPEEDUP System. Presented at AIChE National Meeting, New Orleans, March 1992; paper no. 57b. Received for review August 27, 1992 Revised manuscript received December 17, 1992 Accepted January 8, 1993