Ind. Eng. Chem. Res. 1999, 38, 4449-4457
4449
Training Neural Networks for Pressure Swing Adsorption Processes Narasimhan Sundaram† Praxair Inc., Tonawanda, New York 14151
A three-layer neural network is used to analyze the relationship between input and output variables for pressure swing adsorption (PSA) processes. The network is trained using a modified version of the back-propagation (BP) algorithm. The modification consists of minimizing the error corresponding to a reordered set of inputs. The inputs are arranged according to the errors (e.g., largest to smallest) they generate at the output of the network. The BP training algorithm is then implemented on the input-output measurement which gave the largest error. This operation is repeated during the entire training process. This paper demonstrates the effectiveness of reordering after single and multiple passes (“epochs”) through the training set. The method is illustrated by application to a PSA cycle for separation of carbon monoxide and hydrogen. In another application, a PSA cycle for hydrogen production from natural gas is discussed. In these two applications, a single epoch using reordering is discussed. A third example pertains to nitrogen production from air by PSA/vacuum swing adsorption (VSA) cycles. In this case, multiple epochs have been considered. Introduction Pressure swing adsorption (PSA) is a mature technology finding widespread use in industrial gas separation and purification. Since the early air-drying applications of Skarstrom,1 PSA and the related temperature swing adsorption process (TSA) are used in air separation, hydrogen purification, air prepurification for cryogenic air separation, and production of carbon dioxide and carbon monoxide and now are even used on NASA’s space shuttle. A recent review of this simple yet useful technology appears in Ruthven et al.2 while earlier development appears in Ruthven,3 Tondeur and Wankat,4 and Yang.5 While industrial practice has led most of the advances, the last decade has seen increasing use of numerical models that simulate the process and replace expensive laboratory and pilot-plant experimentation. PSA/TSA processes, which rely on the development of stable periodic steady states describing the concentration and temperature profiles across fixed adsorption beds, lend themselves readily to such numerical treatment. With the advent of powerful computers, increasingly complex models,6-9 which can incorporate a wide variety of effects such as particle diffusion models, rigorous momentum balances, complex, multicomponent competitive adsorption equilibrium, kinetics, and heat of adsorption, are used to identify promising methods to conduct PSA/TSA in multiple beds and with complicated cycles.10 However, numerical techniques are sensitive to choices of inputs for fundamental information such as sorbatesorbent properties. In addition, even with the faster computing power that is now available, rigorous numerical simulations can take significant time to converge to the periodic steady state required for proper design evaluation. Indeed, separate methods have been developed to accelerate convergence to this periodic state.11-14 Complex relationships exist between PSA process input and output variables such as pressure ratio, purge †
E-mail:
[email protected].
fraction, purities, recoveries, bed lengths, cycle times, etc. at the periodic state, i.e., the equilibrium state that develops after several cycles of the transient, co- and countercurrent operation that is the basis of PSA. Equilibrium theory15-17 provides some relations. A relation for air purification PSA where mass transfer is important has also been obtained.18 These and other methods19 rely on simplifying assumptions to render tractable the complex solutions to the governing differential equations. Such methods can be useful in estimating basic PSA requirements for simple cycles and processes as described recently by LaCava et al.20 When the process uses multiple layers or additional cycle steps, the interrelationships between dominant variables naturally become more complex. The purpose of this paper is to demonstrate the utility of artificial neural networks in obtaining such relationships for PSA processes. Neural networks are increasingly being used21 to analyze nonlinear problems which are otherwise difficult or expensive to quantify. The method provides a correlation between different process variables, but no assumptions about any dependences are made. Certainly correlations, whether empirical or model-based, can provide useful design information that is somewhat more flexible than rigorous numerical simulation. A recent example of this is in distillation practice.22 For oxygen vacuum PSA (VPSA), neural networks have been applied recently,23 although a nonneural optimization of PSA was contemplated much earlier.24 This paper begins with a description of neural networks and then applies the analysis to three PSA processes.25-27 Neural Networks Artificial neural networks can be used to identify complex input-output relationships. These networks consist of nodes which are organized as distinct layers between the input layer and the output layer. Layers between the input layer and the output layer are called
10.1021/ie9901731 CCC: $18.00 © 1999 American Chemical Society Published on Web 09/18/1999
4450
Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999
Figure 1. Block diagram of the back-propagation (BP) algorithm incorporating input reordering based on maximum error to train a three-layer neural network.
hidden layers. The three-layer network used in this paper consists of the input and output layers as well as one hidden layer. The hidden layer consists of a specified number of nodes which are linked to the nodes in the preceding layer and the succeeding layer by adjustable weights. The variables furnished as inputs to a neural network are scaled to occupy the range [0, 1] or [-1, 1]. For positive, real-valued inputs the [0, 1] range is chosen. The initial assignment of the weights is random over the range [-1, 1]. Two operations are performed at each node of all layers other than the input layer. First, a weighted sum of each nodal input is computed. This sum is converted to a number in the range [0, 1] by using a sigmoidal activation function prior to being fed as an input to the next layer. The sigmoidal activation function28 is a nonlinear thresholding operation which maps an input lying in the range [-∞, +∞] to the range [0, 1]. The mapping is written as
output )
1 (1 + exp(-a × input) + b)
(1)
where a and b are appropriately chosen constants. Large negative inputs are mapped to values near zero while large positive values are mapped close to 1. For b ) 0, the input ) 0 is mapped to the output ) 0.5. The constant a controls the rate of change of the output with respect to the input. The constant b is used as a threshold to assign an input other than zero (positive or negative) to the output of 0.5. There exist neural network algorithms which systematically adapt these constants to gain finer control of the learning and recall process. If the number of nodes in the input layer is denoted by Nc, the number of nodes in the hidden layer by Nh, and the number of nodes in the output layer by No, then the total number of adjustable weights Nw in the three-layer network is given by Nw ) (Nc + 1)Nh + (Nh + 1)No, where a fixed unit input node is included in both the input layer and the hidden layer. The weights associated with such nodes act as variable thresholds during the sigmoidal activation. The outputs of the network are [0, 1] scaled versions of the actual outputs. Training the Neural Network Training and testing constitute the two stages in the implementation of a neural network designed to relate
the values of chosen input variables to the corresponding observed values of the output variables. The training of a neural network with the above architecture comprises weight adjustments based on an objective criterion. An obvious choice of the objective criterion is the sum of the square of the differences between the observed output and the network-predicted output for all sets of input-output measurements. The back-propagation (BP) algorithm28 is used to minimize the total error at the output by adjusting the Nw weights from the output back toward the input layer. This conventional approach to training a neural network consists of minimizing the sum of the square of the differences between the measured output and the network-generated output for all input-output sets. This minimization is performed for each input-output set during one pass (called an “epoch”) through the training set in no prespecified order of the input-output sets. Typically, several epochs are required for adequate training. The BP algorithm consists of the following steps applied to each input-output pattern in the training set. The method is the same when there is only one input-output pattern. (1) Compute hidden layer inputs as a weighted combination of input variables. (2) Calculate the outputs from the hidden layer using a sigmoidal activation function at each hidden layer node. (3) Perform steps 1 and 2 at each of the nodes of the output layer. (4) Determine the error terms for the output nodes and the hidden nodes. (5) Update the weights of the output layer. (6) Update the weights of the hidden layer. Reordering of Inputs during Training In this paper, a modified version of the BP algorithm is employed during network training. A block diagram illustrating the various stages is shown in Figure 1. In this figure, the block labeled “Max. error based input selection” represents the modification to the BP algorithm. First, the randomly initialized neural network is presented all the inputs in the training set. Inputs referred to here have as components several individual quantities such as pressure, feed step time, flow rate, temperature, etc. The output corresponding to each input typically has a single component, such as purity or recovery. The network is then trained with that input-output pair which yields the largest square error at the output of the network, i.e., the square of the difference between the network output and the mea-
Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999 4451
sured output. Then, the converged network, i.e., after the minimization is achieved, is used to determine network-predicted outputs due to the other inputs in the training set. These network-predicted outputs are compared with the corresponding measured outputs, and the errors are identified. The network is then trained with the input which yields the largest error. A single epoch or training cycle consists of repeating this procedure for all the remaining inputs. This procedure can also be implemented for multiple epochs. The first two examples considered in this paper show error reduction achieved after a single epoch. The third example extends the above strategy to multiple epochs. There are advantages to using input reordering. The training of the network can be performed with specific inputs based on a classification of the errors. For instance, training based on the maximum error criterion will yield a network sensitive to large deviations during testing, while training based on the minimum error criterion is more appropriate to track small variations. In addition, as the dimensions of the problem grow (Nc and No become large), the network can be appropriately partitioned into submodules with a selective implementation of the modified BP algorithm. This can significantly reduce the learning time of the network. The parameters which influence the training process are the number of weights Nw which must be updated, the initial choice of the constant η0 during gradient descent for error convergence, and Rh and Ro, the adaptive thresholds of the sigmoidal nonlinearities. For a specified number of input nodes Nc and output nodes No, variation in the number of nodes in the hidden layer Nh is used to study the features of the network. In addition, the gradient descent parameter is changed to improve convergence. The threshold levels are varied to determine the sensitivity of the network. The effectiveness of the proposed training scheme is determined by the ability of the network to learn the inputoutput relationships with a chosen set of inputs and outputs. Let Nt denote the number of measured inputoutput values used during training in the case of a single output node, i.e., No ) 1. One measure of successful training is the absolute deviation (AD), which is expressed as the mean sum of the absolute deviation between the observed output Oj and the network-trained output Pj (after convergence of the BP algorithm) and written as Nt
AD ) (1/Nt)
|Oj - Pj| ∑ j)1
(2)
The root mean square difference (RMSD) is another effective measure of the training process. It is the square root of the sum of the square of the differences between Oj and Pj: Nt
(Oj - Pj)2]1/2 ∑ j)1
RMSD ) [(1/Nt)
(3)
The expressions in (2) and (3) can be written for networks with more than one output node (Nt > 1). Let k denote the index on the output nodes, i.e., 1 e k e No, and (Oj)k and (Pj)k the observed and network-trained
outputs, respectively, at node k. The resulting forms of AD and RMSD are Nt
|(Oj)k - (Pj)k| ∑ j)1
(AD)k ) (1/Nt)
(4)
Nt
(RMSD)k ) [(1/Nt)
[(Oj)k - (Pj)k]2]1/2 ∑ j)1
(5)
In this paper, the absolute deviation and root mean square difference will be computed for single and multiple output variables for chosen training inputs. The same measures will be used to evaluate the trained neural network during the testing phase. Testing the Neural Network The testing of a trained neural network consists of predicting the output variables for input variables not belonging to the training set. The Nw weights and the activation settings of the trained network are used to determine the outputs for arbitrary inputs. The absolute deviation and root mean square difference are computed for suitable choices of the number Np of input-output sets used during testing. Inadequate training of the network will result in large values of the AD and RMSD during testing. It is also possible to overtrain a network with several input-output sets yielding extremely small or extremely large deviations during the initial phase of the training process. This can lead to poor performance during testing. The purpose of ordering the training set according to their contributions to the total square error is to diminish the risk of poor training and to ensure reliable performance during the testing stage. To distinguish the error measures for the two stages, ADt and RMSDt are used to denote the measures obtained during training and ADp and RMSDp are used for those predicted during testing. The emphasis of this paper is on the value of selection criteria or reordering principles when used in conjunction with conventional BP-based training procedures. These criteria are applicable to any neural network used to understand PSA processes. Reordering is incorporated in the proposed training procedure to rank inputs according to an error criterion and enable one to enforce training strategies with input selectivity. Reordering helps classify the process input variables into ranges within which the neural network can predict process output variables. If this strategy is employed in conjunction with data acquisition, then it is possible to avoid unnecessary experimentation. Application to PSA In what follows, the neural network is trained in the manner described above for three separate PSA processes. These are (A) a two-bed (each with a single layer), four-step PSA cycle for carbon monoxide and hydrogen separation (Kapoor and Yang25), (B) a twobed (each with two layers), six-step PSA cycle for production of hydrogen from natural gas (Chlendi et al.26), and (C) a single-bed, four-step PSA/VSA cycle for nitrogen production from air (Shin and Knaebel27). In the first two systems, polynomial correlations were developed by the authors to describe the input-output (I-O) relationships. These polynomial representations
4452
Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999
Table 1. Constants in Polynomial Regression25 in Eq 6 for Case A output variable
a0
a1
a2
a3
b1
b2
b3
c1
c2
c3
H2 purity, % H2 recovery, % CO purity, % CO recovery, %
95.787 96.837 94.759 57.122
3.431 2.471 3.405 2.490
0.082 -4.442 -4.246 10.130
-10.086 4.282 1.996 -25.090
-2.847 -0.815 -1.086 -0.053
0.337 -0.228 0.376 -2.747
-4.603 -2.979 -1.763 8.216
0.055 1.385 1.018 -0.763
-0.064 3.296 1.682 -1.612
2.297 -1.073 0.135 -0.247
allow simulation of the cycles, so as to obtain the necessary inputs-outputs to train the neural network. Subsequently, they are also used to test the network, by comparing predictions of the network against the values obtained from these polynomials, for the same inputs. Case C was an experimental study. It will be perceived that the example processes chosen are complex and their I-O relations are not obvious. An advantage of using systems A and B is that any number of inputs-outputs can be generated, due to the polynomials available. This allows flexibility in training/ testing the neural method. The reason for using these polynomial representations here is principally to compare the unmodified BP algorithm with the BP algrorithm on the basis of reordering, using examples from the PSA literature. In general, suitable polynomial representations of PSA processes can be difficult to obtain, as discussed by Chlendi et al.26 For case C, experimental data were available, and therefore these data were used in training the network and comparing the algorithms. Properly training neural networks also usually requires a large input training set. For comparison, Lewandowski et al.24 required 221 laboratory experiments to train their generic, unmodified BP neural network, which was further tested against another 73 experiments. Chlendi et al.26 performed 523 numerical experiments in the course of obtaining their polynomial correlation. Both cases A and B below were trained using one epoch, while case C was trained using both single and multiple epochs. (A) Four-Step PSA Cycle for H2-CO Separation. PSA process variables for the production of CO in a fourstep cycle (Kapoor and Yang25) were identified as the feed pressure PF, the end pressure of the third step PI, and the feed throughput F. The observed outputs in this case are the product recoveries and purities. The three process variables have been considered for typically measured input-output data. These are scaled to lie in the range [0, 1]. In case A, denoting the output variable by y and the scaled input variables by vi, i ) 1, 2, 3, the polynomial relation used is given by 3
y ) a0 +
∑ i)1
3
aivi +
∑ i)1
2
bivi2 +
civivi+1 + c3v1v3 ∑ i)1
(6)
where v1 is scaled PF, v2 is scaled PI, and v3 is scaled throughput F, defined in the nomenclature. Polynomial coefficients for eq 6 are given in Table 1. Figure 2 displays the training of the neural network to estimate the purity of CO using Nt ) 60, Nh ) 100, Rh ) 0.7, Ro ) 0.7, and η0 ) 0.8 in the modified BP algorithm after a single epoch. This translates to the use of 60 input-output measurements to train a network with 100 nodes in the hidden layer (Nw ) 501) and the external activation levels for the hidden layer and output layer set at 0.7. The initial value of the gradient descent parameter for weight updates is 0.8. This value is diminished by a factor of 2 during each iteration involving weight updates. Thus, as the error minimization progresses, this parameter approaches
Figure 2. Case A: measured vs trained (O) and predicted (+) estimates (scaled) for the purity of CO using the modified network, i.e., with input reordering, Nt ) 60, Nh ) 100, Rh ) 0.7, Ro ) 0.7, and η0 ) 0.8, and epoch count 1.
zero and the weight updates occur in successively smaller steps. The circles denote the network-trained outputs versus the corresponding measured outputs. Figure 2 also displays with plus marks the networkpredicted outputs for 30 separately identified process input settings (distinct from training inputs) versus the corresponding measured outputs. This plot also contains the unit-slope line, i.e., the line of perfect fit with points corresponding to zero error during training and/or prediction. Ideally, training and prediction should place their outcomes on this line. The ADt and RMSDt for this set of 60 inputs were 0.015 688 and 0.032 616, respectively, after a single epoch of training with reordered inputs. The corresponding values of ADp and RMSDp were 0.060 706 and 0.070 642. The result of the training procedure for the set of 60 inputs appears in Figure 3a. The outcome of the testing process is illustrated in Figure 3b. Test cases are considered in no specific order. It should be noted that the absolute deviation of 6% and the root mean square deviation of 7% obtained for predictions are computed after a single epoch of training with reordering. These deviations are reduced after multiple epochs of training. Figure 4 displays the inaccurate predictions (plus marks) made by even a well-trained network (circles) which uses the conventional BP algorithm after a single epoch. All other inputs are maintained the same as in Figures 2 and 3. Although the absolute deviation during training in Figure 4 is less than 1%, the corresponding value during prediction is an unacceptable 25%. This is also seen in Figure 5b, which shows wide discrepancies between network-predicted outputs and observed outputs despite the seemingly perfect training (Figure 5a). In addition Figure 3a reveals that the training could essentially have been terminated with about 60% of the chosen set of inputs, signified by the leveling off of the purity (flat line), indicating that weight coefficients
Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999 4453
Figure 3. Case A: (a) measured (O) and trained (+) outputs for the CO purity using the modified network, i.e., with input reordering, Nt ) 60, Nh ) 100, Rh ) 0.7, Ro ) 0.7, and η0 ) 0.8; (b) measured (O) and predicted (+) outputs of 30 test cases for CO purity with parameters from (a) and epoch count 1.
Figure 4. Case A: measured vs trained (O) and predicted (+) estimates (scaled) for CO purity using the network with Nt ) 60, Nh ) 100, Rh ) 0.7, Ro ) 0.7, and η0 ) 0.8, the unmodified BP algorithm, and epoch count 1.
obtained at input no. 35 may be used for all subsequent inputs. No such conclusion can be drawn from Figure 5a, where learning continues even up to input no. 60. It can be noted that in the case of single-epoch training with the BP algorithm using reordered inputs, even though the outputs for the training data showed significant variation (large change in measured output as the input index changes, Figure 3a), the predicted output stayed within 6-7% of the measured output for the chosen set of activation levels (Figure 3b). However, in the case of single-epoch training with the conventional BP algorithm, the poor predictions in Figure 5b result from the lack of proper training of the network to wide variations as seen in Figure 5a (small change in the measured output as the input index changes). The same network parameters and a single epoch of the modified BP were also used to train the network to the recovery of CO. In this case, an absolute deviation of about 2.9% was observed after the single epoch of training and about 6.3% during testing. Further evi-
Figure 5. Case A: (a) measured (O) and trained (+) outputs for the purity of CO using the network with Nt ) 60, Nh ) 100, Rh ) 0.7, Ro ) 0.7, and η0 ) 0.8, and the unmodified BP algorithm, i.e., without input reordering; (b) measured (O) and predicted (+) outputs of 30 test cases for CO purity with parameters from (a), the unmodified BP algorithm, and epoch count 1.
dence of the advantage of input selection for efficient training was obtained in the case involving the recovery of hydrogen. Here, the absolute deviation during testing was about 5% after a single epoch of training with about half of the rearranged input set and an absolute deviation during training of 3.2%. Likewise, the single epoch of training followed by prediction of the purity of hydrogen yielded an absolute deviation of about 4.9% after training and 5.1% during testing. In the case of all these outputs, training the network to inputs with reordering demonstrates reliable predictions even after a single epoch. Although this study is restricted to a network with the same number of nodes in the hidden layer and fixed activation thresholds, these can be treated as parameters which can be modified during training to improve the overall learning process. The network predictions improve (lower absolute deviations) after multiple epochs of training with reordered inputs, which is demonstrated later with the experimental data of case C. (B) Six-Step PSA Cycle for H2 Production from Natural Gas. PSA process variables for the recovery of hydrogen in a six-step cycle (Chlendi et al.26) have been identified as the duration of the adsorption step DAP, the high pressure PH, the feed velocity VA, and the composition of CO2 as denoted by xCO2. There are two layers of sorbents, and the authors used 523 numerical simulations to obtain the I-O data needed for subsequent use in identifying coefficients of polynomials for representing the process. The polynomials are intended as design tools and also to obtain optimal conditions of operation. The authors suggest that a neural network may be implemented; however, they only report results of their polynomial regression. In this case, the outputs were written in terms of the four input variables X, Y, Z, and T, defined in the nomenclature, which represented, respectively, the scaled process inputs DAP, PH, VA, and xCO2. Although four output quantities (productivity, purity, yield, and quantity of hydrogen) were studied, only coefficients for the quantity of hydrogen produced per cycle were given.26 Therefore, only this output variable was used in training the neural network. The quantity
4454
Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999
Table 2. Constants in Polynomial Regression26 in Eq 7 for Case B for Output Variable qH2 output variable qH2
a0
a1
a2
a3
a4
b1
b2
b3
b4
b5
b6
c1
c2
c3
c4
d1
9.634 4.796 5.250 4.904 -1.1823 2.687 2.383 -0.74 2.398 -0.881 -0.643 1.254 -0.601 -0.418 -0.552 -0.386
Figure 6. Case B: (a) measured (O) and trained (+) outputs for the quantity of hydrogen produced per cycle using the modified network with Nt ) 100, Nh ) 75, Rh ) 0.9, Ro ) 0.9, and η0 ) 0.8; (b) measured (O) and predicted (+) outputs of 20 test cases for the recovery of hydrogen with parameters from (a) and epoch count 1.
Figure 7. Case C: (a) measured (O) and trained (+) outputs for the percent recovery of nitrogen from air using the modified network with Nt ) 25, Nh ) 125, Rh ) 1.1, Ro ) 1.1, and η0 ) 0.8, and epoch count 1; (b) measured (O) and predicted (+) outputs of six test cases for the percent recovery of nitrogen with parameters from (a).
of hydrogen recovered qH2 was written as 4
qH2 ) a0 +
aifi + b1f1f2 + b2f1f3 + b3f1f4 + b4f2f3 + ∑ i)1
b5f2f4 + b6f3f4 + c1f1f2f3 + c2f2f3f4 + c3f1f3f4 + c4f1f2f4 + d1f1f2f3f4 (7) where f1 ) 2X - 3, f2 ) 2Y - 3, f3 ) 2Z - 3, f4 ) 2T 3, and ai, bi, ci, and di are polynomial coefficients for eq 7 which are given in Table 2. Figure 6 corresponds to the training of the network using 100 input sets and 75 nodes in the hidden layer (Nw ) 451). The absolute deviation during prediction was only about 2%, although that during training was about 7.5%. The predictions in Figure 6b indicate the sensitivity of the network to detect and track subtle variations in the measured output levels despite wide swings in the outputs of the training set (Figure 6a). (C) Four-Step PSA/VSA Cycle for the Production of N2 from Air. This example is intended to demonstrate the effectiveness of the neural network during training and prediction of PSA processes with experimental data. These data are provided by Shin and Knaebel27 for the separation of nitrogen from air using type RS-10 molecular sieves. In this case, the performance of the neural network is observed after it is trained using a single epoch and multiple epochs. The training is conducted using the conventional BP algorithm and the proposed reordered approach to BP. The twelve process variables relevant to their experiments are the four time durations of the cycle steps, the two flow rates, the high and low pressures, the blowdown and purge pressures, and the column geometry as specified by its length and cross-sectional diameter. These constitute the components of the inputs to the three-layer neural network. The two PSA process outputs in this case are the percent purity and percent
Figure 8. Case C: network-predicted variation in recovery and purity for changes in purge volume (a, b) and changes in purge step duration (c, d) after training with an epoch count of 50.
recovery of nitrogen. The network is trained using the modified BP with 25 distinct inputs and their corresponding experimentally observed outputs (i.e., measurements of the purity and recovery of nitrogen) as reported by Shin and Knaebel.27 Figure 7 displays the outcome after training with a single epoch using the modified BP algorithm for the percent recovery of nitrogen. The absolute deviation after training with 25 inputs is 4.8%, and the corresponding absolute deviation for the prediction of 6 test cases is 1.87%. In this case, the predicted values lie in a narrow range between 0.09 and 0.2. Figure 8 shows trends for recovery and purity as a function of purge volume and purge step duration, obtained using the network, trained to 50 epochs with the modified BP algorithm. These demonstrate the ability of the neural
Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999 4455
Figure 9. Case C: (a) measured (O) and trained (+) outputs for the percent purity of nitrogen from air using the modified network with Nt ) 25, Nh ) 150, Rh ) 1.4, Ro ) 1.4, and η0 ) 0.8, and epoch count ) 50. (b) Measured (O) and predicted (+) outputs of five test cases for the percent purity of nitrogen with parameters from (a).
architecture to track the variation in the PSA output variable due to arbitrary settings of chosen PSA process inputs in spite of being trained with a limited number (25) of input-output measurements. For instance, parts a and b of Figure 8 display the decrease in the recovery of nitrogen and the corresponding increase in the purity of nitrogen as the purge volume is raised from approximately 0.000 017 to 0.000 06 m3 at STP. While the percent recovery drops from about 9% to 6.5%, the percent purity rises from about 99.7% to 99.85%. Similar trends appear in Figure 8c,d, where, as the purge step duration is increased, the fall of the percent recovery (Figure 8c) corresponds to the rise of the percent purity (Figure 8d) of nitrogen. It may also be noted that, while the changes in the percent recovery of nitrogen are large (up to 2.5% in Figure 8a and 0.7% in Figure 8c), the corresponding transitions in the percent purity of nitrogen are considerably smaller (within 0.2% in Figure 8b and about 0.5% in Figure 8d). Figure 8 also shows the experimental data, corresponding to the circles in Figure 8, used for the training. Note that the predicted trends, however, have different x-axis values, since they are obtained at points intermediate to the training set. While the network trained to the limited number (25) of input-output measurements follows the general trends of the variables, there are still differences between the measured and predicted values of purity and recovery. However, Figure 8 shows that the experimental and prediction curves intersect, implying that there is agreement of prediction and experiment at least at one point. This indicates that changes to the network operating parameters such as the activation thresholds, eq 1, the training epoch count and number of hidden layers may all be used to improve the sensitivity of the network. Figure 9 displays the outcome after training with 50 epochs using the modified BP algorithm for the percent purity of nitrogen. Comparisons are made with the unmodified BP algorithm for the same number of epochs, for which, however, no figures are shown. The network parameters used in this case are Nt ) 25, Nh ) 150, Rh ) 1.4, Ro ) 1.4, and η0 ) 0.8. The absolute deviation after training with 50 epochs and 25 inputs
Figure 10. Case C: (a) measured (O) and trained (+) outputs for the percent recovery of nitrogen from air using the modified network with Nt ) 25, Nh ) 125, Rh ) 1.1, Ro ) 1.1, and η0 ) 0.8, and epoch count ) 50; (b) measured (O) and predicted (+) outputs of six test cases for the percent recovery of nitrogen with parameters from (a).
is 1.2%, and the corresponding absolute deviation for the prediction of 5 test cases is 1.8%. Figure 9a shows the measured outputs represented by circles and the network-trained outputs denoted by plus marks. The measured values of the percent purity of nitrogen are denoted by circles, and the network-predicted outputs are identified by plus marks in Figure 9b. In this case, the prediction of five test input-output measurements yielded an absolute deviation of only about 1.8%. It is reiterated that the training and prediction are based only on the experimental observations of the two output variables. In comparison the corresponding absolute deviation using the unmodified BP after 50 epochs was 3% after training and 7% for prediction. Figure 10 shows the outcome after training with 50 epochs using the modified BP algorithm for the percent recovery of nitrogen. The set of network parameters used here are Nt ) 25, Nh ) 125, Rh ) 1.1, Ro ) 1.1, and η0 ) 0.8. Figure 10a shows the measured values of the percent recovery of nitrogen (O) and the networktrained outputs (+) for the 25 inputs used during training. The absolute deviation during training with the modified BP is maintained within 1.3% after 50 epochs. The corresponding deviation using the unmodified BP was 2.2% after 50 epochs. The prediction of six test outputs (Figure 10b) yields an absolute deviation of about 1.6%, compared to 7.7% from the unmodified BP algorithm. Discussion Selective training with reordering as described here offers the advantage of making recommendations for new experiments based on the observation of poor performance during predictions. Case C is an experimental study with a limited number of measurements. For this case, the neural network trained with ranges of inputs corresponding to low values of measured purity (less than 92%) and high values of measured recovery (greater than 20%) yielded high deviations during prediction. To improve the training and prediction of the network, new experiments in these particular input ranges are recommended.
4456
Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999
The analysis shows that smaller training sets may suffice for properly training a neural network to learn the process relationships which can reduce experimentation. If the original training set consists of a very large number of input-output measurements, then training the network with all these measurements may be impractical as well as unnecessary. It is also possible that several of these measurements are inaccurate or redundant. The inaccuracy can result from an incorrect association of the input variables to the output values due to measurement noise or human/system interface error. The proposed strategy determines ranges of input-output measurements within which the neural network can reliably predict the process output variables. Input-output measurements which lead to large prediction errors are treated as “outliers”. The training and prediction of the network can be improved by performing new experiments with such inputs. The training and testing of PSA process outputs using a three-layer neural network and a modified version of the back-propagation algorithm reveal the potential of such configurations to track the evolution of PSA processes. Laboratory and pilot experimentations remain the preferred methods of obtaining useful adsorption process design information. The neural network outlined here uses and requires starting information from at least one of these sources. It cannot begin training without such inputs and usually requires substantial inputs23,26 before training is complete. The simple applications presented here can be extended to problems requiring the estimation and tracking of several outputs depending on a large number of input components. The rearrangement strategy discussed in this paper suggests a more efficient use of the input data available during training. The network is robust and sensitive to a wide range of variation in the output during the prediction stage. Acknowledgment The author is grateful for the extensive review of the manuscript given by Dr. M. Lockett, Praxair Corporate Fellow, the comments of Dr. J. Billingham, and the support of G. Henzler, Manager, ASU Warm End. The author also thanks Dr. S. Lerner, Director, Praxair Technology R&D, for kind permission to publish. Nomenclature AD ) absolute deviation, eq 2 DAP ) duration of the adsorption step, s F ) feed throughput, mol/s k ) index on the output nodes 1 e k e No Nc ) number of nodes in the input layer Nh ) number of nodes in the hidden layer No ) number of nodes in the output layer Np ) number of measured input-output values used during testing Nt ) the number of measured input-output values used during training Nw ) total number of adjustable weights (Oj)k ) observed output at node k (Pj)k ) network-trained output at node k PF ) feed pressure, Pa PI ) end pressure of the third step, Pa RMSD ) rmsd root mean square difference, eq 3 PH ) high pressure, Pa T ) (xCO2 + 0.075)/0.15 V1 ) (PF - 300)/100
V2 ) (PI - 70)/30 V3 ) (F - 550)/235 VA ) feed velocity, m/s xCO2 ) composition of CO2 X ) (DAP + 650)/1300 Y ) (PH + 10)/20 Z ) (VA + 0.0015)/0.003 Greek Symbols Rh, Ro ) adaptive thresholding of the sigmoidal nonlinearities η0 ) constant during gradient descent for error convergence Subscripts p ) predicted t ) trained
Literature Cited (1) Skarstrom, C. W. Method and Apparatus for Fractionating Gaseous Mixtures by Adsorption. U.S. Patent 2,944,627, 1960. (2) Ruthven, D. M.; Farooq, S.; Knaebel, K.S. Pressure Swing Adsorption; VCH: New York, 1994. (3) Ruthven, D. M. Principles of Adsorption and Adsorption Processes; Wiley: New York, 1984. (4) Tondeur, D.; Wankat, P. C. Gas Purification by Pressure Swing Adsorption. Sep. Purif. Methods 1985, 14, 157. (5) Yang, R. T. Gas Separation by Adsorption Processes; Butterworth: Boston, 1987. (6) Farooq, S.; Ruthven, D. M. Numerical Simulation of a Kinetically Controlled Pressure Swing Adsorption Bulk Separation Process based on a Diffusion Model. Chem. Eng. Sci. 1991, 46, 2213. (7) Lu, Z.; Loureiro, J. M.; LeVan, M. D.; Rodrigues, A. E. Pressure Swing Adsorption Processes: Interparticle Diffusion/ Convection Models. Ind. Eng. Chem. Res. 1993, 32, 2740. (8) Kikkinides, E. S.; Yang, R. T. ; Cho, S. H. Concentration and Recovery of CO2 from Flue Gas by PSA. Ind. Eng. Chem. Res. 1993, 32, 2632. (9) Liu, Y.; Ritter, J. A. PSA-Solvent Vapor Recovery: Process Dynamics and Parametric Study. Ind. Eng. Chem. Res. 1996, 35, 2299. (10) Malek, A.; Farooq, S. Hydrogen Purification from Refinery Fuel Gas by PSA. AIChE J. 1998, 44, 1985. (11) Suzuki, M. Continuous Counter-current Flow Approximation for Dynamic Steady State Profiles of Pressure Swing Adsorption. AIChE Symp. Ser. 1985, 81 (242), 67. (12) LeVan, M. D.; Croft, D. T. Determination of Periodic States of Pressure Swing Adsorption Cycles. In Adsorption Processes for Gas Separation, Recents Progres en Genie des Procedes; Meunier, F., LeVan, M. D., Eds.; Lavoisier: Cachan, France, 1991; Vol. 17, No. 5, pp 197-202. (13) Smith, O. J.; Westerberg, A. W. Acceleration of Cyclic Steady State Convergence for Pressure Swing Adsorption. Ind. Eng. Chem. Res. 1992, 30, 1023. (14) Croft, D. A.; LeVan, M. D. Periodic States of Adsorption Cycles-II. Solution Spaces and Multiplicity. Chem. Eng. Sci. 1994, 49, 1831. (15) Chan, Y. N. I.; Hill, F. B.; Wong, Y. W. Equilibrium Theory of a Pressure Swing Adsorption Process. Chem. Eng. Sci. 1981, 36, 243. (16) Knaebel, K. S.; Hill, F. B. Pressure Swing Adsorption: Development of an Equilibrium Theory for Gas Separations. Chem. Eng. Sci. 1985, 40, 2351. (17) LeVan, M. D. Pressure Swing Adsorption: Equilibrium Theory for Purification and Enrichment. Ind. Eng. Chem. Res. 1995, 34, 2655. (18) Sundaram, N. A Non-Iterative Solution for Periodic Steady States in Gas Purification PSA. Ind. Eng. Chem. Res. 1993, 32, 1686. (19) White, D. H. Practical Aspects of Air Purification by PSA. AIChE Symp. Ser. 1988, 84 (264), 129. (20) LaCava, A. I.; Shirley, A.; Ramachandran, R. How to Specify PSA Units. Chem. Eng. 1998, 6, 100. (21) Grossberg, S. Nonlinear Neural Networks: Principles, Mechanisms, and Architectures. Neural Networks 1988, 1, 17.
Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999 4457 (22) Lockett, M. J. Easily Predict Structured-Packing HETP. Chem. Eng. Prog. 1998, 1, 60. (23) Lewandowski, J.; Lemcoff, N. O.; Palosaari, S. Use of Neural Networks in the Simulation and Optimization of PSA Processes. Chem. Eng. Technol. 1998, 21 (7), 593. (24) Doshi, K. J.; Katira, C. H.; Stewart, H. A. Optimization of a Pressure Swing Cycle. AIChE Symp. Ser. 1971, 67 (117), 90. (25) Kapoor, A.; Yang, R. T. Optimization of a Pressure Swing Adsorption Cycle. Ind. Eng. Chem. Res. 1988, 27, 204. (26) Chlendi, M.; Tondeur, D.; Rolland, F. A. Method to Obtain a Compact Representation of Process Performances from a Numerical Simulator: Example of PSA for Pure Hydrogen Production. Gas Sep. Purif. 1995, 9, 125.
(27) Shin, H-S.; Knaebel, K. S. Pressure Swing Adsorption: An Experimental Study of Diffusion-Induced Separation. AIChE J. 1988, 34, 1409. (28) Rumelhart, D. E.; Hinton, G. E.; Williams, R. J. In Parallel Distributed Processing; Rumelhart, D. E., McClelland, J. L., Eds.; MIT Press: Cambridge, MA, 1988; Vol. I, p 318.
Received for review March 8, 1999 Revised manuscript received July 26, 1999 Accepted August 9, 1999 IE9901731