Modeling of the Blast Furnace Burden Distribution by Evolving Neural

A model of burden layer formation in the blast furnace is developed on the basis of layer thicknesses estimated from radar measurements of the burden ...
1 downloads 0 Views 273KB Size
2314

Ind. Eng. Chem. Res. 2003, 42, 2314-2323

PROCESS DESIGN AND CONTROL Modeling of the Blast Furnace Burden Distribution by Evolving Neural Networks J. Hinnela1 ,† H. Saxe´ n,* and F. Pettersson‡ Faculty of Chemical Engineering, A° bo Akademi University, Biskopsgatan 8, FIN-20500 A° bo, Finland

A model of burden layer formation in the blast furnace is developed on the basis of layer thicknesses estimated from radar measurements of the burden (stock) level in the furnace. The dependence between the layer thickness and charging variables is modeled by neural networks. Parsimonious networks are determined by an evolutionary algorithm, which simultaneously trains weights and network connectivity. The efficiency of the training procedure is enhanced by tackling part of the numerical optimization by linear least squares. The resulting network models are utilized in a hybrid model, which considers practical constraints of the charging process in the furnace. The hybrid model is used to evaluate the impact of altered boundary conditions in novel charging programs. 1. Introduction The blast furnace is the main process for producing molten iron for primary steelmaking in the world.1 The furnace acts as a huge shaftlike countercurrent heat exchanger and chemical reactor, where the iron-bearing burden, enriched and agglomerated to sinter or pellets, is charged with coke in alternate layers. Hot blast supplied to the lower part of the process maintains combustion of the coke and yields hot gases that reduce and smelt the iron oxides on their descent. The molten iron and the byproduct, slag, are tapped intermittently through tapholes from the lower furnace, while the partially oxidized gas leaves the process at the top. The radial distribution of the charged materials plays a key role in achieving an energy-efficient operation of the process because it affects the pressure loss and the local mass flows of solid and gas in the shaft that together influence the indirect reduction degree of the ore, the shape and level of the melting zone, and the heat losses through the furnace wall lining.2 In furnaces with bell-top charging equipment, the burden distribution can be manipulated by applying different movable armor positions for the dumps or by the dump size, the charging sequence, and the level of the burden (stock). Numerous ingenious devices have been developed for “measuring” the burden distribution,1,3-5 but it is still a difficult task to estimate it reliably in operating furnaces. High investment and maintenance costs of such sensors also speak in favor of using indirect estimation methods based on signals from standard instruments.6-8 It is interesting to note that, despite the fact that every blast furnace has at least a mechanical level measurement device (often referred to as stock* To whom correspondence should be addressed. Tel.: +358 2 215 4442. Fax: +358 2 215 4792. E-mail: [email protected]. † E-mail: [email protected]. ‡ E-mail: [email protected].

Figure 1. Schematic of the upper part of the blast furnace with the bell charging equipment and movable armors (MA) for manipulation of the radial distribution of the burden layers and the radar sensing the burden level. The distance to the level (z), the charging set point (zset), and the radial position (r) where the falling burden hit the stock line have also been depicted.

rod),9 there have been very few attempts to quantify the signals automatically.10 In the present work, the burden level is sensed locally by radar (Figure 1) that measure the distance to the stock line, z, in one or two points. The radar signal is preprocessed to provide estimates of the local burden layer thickness, ∆z, and their dependence on variables affecting the burden distribution is modeled by soft computing. A method for simultaneous training of weights and connections in layered neural networks based on an evolutionary algorithm is employed, yield-

10.1021/ie0203779 CCC: $25.00 © 2003 American Chemical Society Published on Web 04/29/2003

Ind. Eng. Chem. Res., Vol. 42, No. 11, 2003 2315

ing models with appropriate inputs and selective connectivity. Because of the sparse connections and elimination of superfluous inputs, the resulting models generalize well and their actions can be easily understood. The networks are applied in a hybrid model that considers practical constraints of the charging process in the furnace to evaluate the impact of the charging programs on the layer thicknesses. The hybrid model makes it possible to gain insight into the complex layerformation process, which can be used to optimize the charging conditions, e.g., the radial ore-to-coke distribution. 2. Modeling by Soft Computing This section presents a discussion on the problems encountered in using multilayer neural networks as modeling tools, followed by a brief description of evolutionary algorithms. Finally, a scheme is proposed in which the two techniques are fused to solve the central problem of simultaneous optimization of network size, structure, and weights in the modeling. 2.1. Neural Network Modeling. Neural networks have become popular tools for solving approximation, pattern recognition, and classification problems,11 and in networks trained by supervised learning, the weights of the connections are determined by numerical optimization.12 An important modeling step is the choice of network architecture and connectivity; overparametrization can be a severe problem because of the large number of parameters inherent in neural networks. To remedy the problem, methods for automatic selection of network complexity have been proposed, ranging from constructive algorithms that start with a small network and add new nodes or connections during the progress of training to destructive methods that start from a large network and delete (prune) unnecessary nodes or connections.13-15 However, most of these methods are based on heuristic criteria or on theory for linear systems, and their performance is very problem-dependent. When the network structure is expressed with binary variables, the optimization task can be seen as a mixedinteger nonlinear programming problem. Unfortunately, existing deterministic algorithms for the solution of such problems are restricted to either convex or small- to medium-sized problems.16,17 Therefore, stochastic optimization methods, such as evolutionary programming, genetic algorithms (GAs), or simulated annealing, have been applied.18-24 Even though the techniques are not expected to yield the globally optimal solution, they may constitute the only feasible alternative for problems with complex search spaces of high dimensionality. To exploit the attractive features of stochastic techniques and to circumvent their drawbacks, one may combine them with methods with better local properties, such as a gradient-based search.25,26 The present paper applies a scheme where the parameters and the structure of feedforward networks are determined with GA in combination with a linear least-squares procedure, which significantly reduces the computational effort and usually yields sparsely connected networks with a good generalization performance. 2.2. GAs. GAs are optimization techniques based on the concepts of natural selection and genetics27-29 that work with a group of candidate solutions, called the population, evolving through time in a way resembling the evolution of a natural population of species. Each

Figure 2. Subdivision of a simple network into two parts, where the parameters of the lower (l and w) and upper (W) parts are determined by a GA and LLSQ, respectively. The bar below the figure shows the variable representation in the chromosome, where the l’s are binary variables and the w’s are connection weights and biases.

individual, or chromosome, of the population is characterized by a sequence of genes and represents a possible solution, while the genes often correspond to the unknown variables to be determined. By selection of the fittest, reproduction and mutation, chromosomes with improved properties gradually evolve. While the solution is being evolved, the fitness of the best chromosome and the coverage of the search space30 can be monitored over the generations to illustrate and follow the progress of the optimization. 2.3. Evolution of Neural Networks. The method applied in this paper is primarily intended for training feedforward networks of the multilayer perceptron type.26 Because GAs are efficient at global sampling but have poorer local convergence properties, the network parameters are partitioned as indicated by the dashed ellipses in Figure 2 for a network with a single layer of hidden sigmoid nodes. The population in the GA consists of chromosomes that include the biases and weights, w, in the lower part of the network, while the weights to and biases of the output nodes, W, are determined by linear least squares (LLSQ). This procedure reduces the number of variables to be manipulated by the GA, while the LLSQ-optimized part guarantees that the evolved individuals are at least near-optimal. The weights and biases are represented as real values in the chromosomes, and the weights to each hidden node are grouped together to preserve well-working nodes during the evolution. The connectivity of the network is defined by binary variables, l, for every weight of a connection to a hidden node. Biases do not have corresponding binary variables in the chromosomes; they are considered if there is at least one active incoming connection to the node in question. The length of a chromosome is, thus, fixed and does not depend on whether the connections are active or not, so these weights are not affected by changes in the binary variables (but only by crossover and mutation). An example of a small network and the genetic representation of its lower part are given in Figure 2. The population is initialized by specifying the maximum network size, assigning random values to the

2316

Ind. Eng. Chem. Res., Vol. 42, No. 11, 2003

genes. It is generally desirable to start with hidden nodes operating in the transition (linear) regime of the sigmoid. Therefore, if the input variables are normalized to the interval (-1, +1), the weights may be initialized as w ∼ N(0,σ2), i.e., as normally distributed random variables with zero mean and σ standard deviation. As a measure of the fitness of a chromosome, the meansquare prediction error, F1, calculated on the differences between observed and predicted outputs, f and ˆf, respectively, for the K observations in the training set is used. To limit the complexity of the network, a penalty term, F2, is included in the objective function, yielding the minimization problem

{

min F ) F1 + F2 ) w,l

x∑ K

k)1

(fk - ˆfk)2 K

I

li ∑ i)1

+q

}

(1)

where I is the number of active connections and q is a user-defined penalty factor. To calculate ˆf for a set of lower-layer weights and biases evolved by the GA, the weights, W, including the biases, of the output nodes are determined with LLSQ by solving a system of linear equations

fˆ ) XW

(2)

where X is a matrix with a first column with ones (for the bias) followed by the outputs from the hidden nodes calculated for the K input vectors. The problem can be solved by, e.g., Householder reflections using an orthogonal-triangular factorization.31 By variation of the penalty factor, q, the size of the evolving networks can be controlled, which affects the generalization performance of the models evolved. 3. Burden Layer Thickness Problem 3.1. Problem Formulation and Data Set. The present study is based on data from the blast furnace no. 1 of Rautaruukki Steel in Raahe, Finland, a mediumsized furnace with a production rate of about 3500 tons/ day using sinter and pellets as burden and coke and heavy oil as reducing agents. The furnace has a twobell top and is equipped with movable armors with 10 possible positions (MA ) 1, ..., 10), and the radar measures the burden level 0.6 m from the furnace throat wall (cf. Figure 1). In an earlier work,32 it was demonstrated that the local burden layer thickness, estimated from stockrod signals, can be modeled by neural networks; the stockrods carry out the same task as the radar, but being mechanical devices, they have to be withdrawn before a dump of burden is charged into the furnace. This intermittent operation and the lack of precision of the massive devices were found to create problems in the burden layer thickness estimation. The radar signal, in turn, is sometimes corrupted by highfrequency noise, so it was passed through a sliding average filter before the stock level and burden layer thickness were estimated. Figure 3 shows the evolution of the signal (solid line), the filtered signal (dotted line), and estimated burden layer thicknesses, ∆z, for a time period with four dumps. In comparison with the stockrod signal, the radar was found to yield a decreased variance and practically no negative values of the layer thickness. Data from three measuring campaigns were used, during which the furnace was operated with 10-dump

Figure 3. Radar signal (solid line) and filtered value (dotted line), with asterisks denoting the time moments selected before and after the dumps for estimating the layer thickness (∆z). Table 1. Dumps, Movable Armor Positions, and Average Values of the Stock Level and Layer Thickness under the Stockrods for Period 2, as Well as Predicted Layer Thicknesses by the Network of Figure 5B, and with It Implemented in the Hybrid Model dump material MA 1 2 3 4 5 6 7 8 9 10

S C+P C S+P C S C+P C S+P C

2 10 7 2 2 2 10 7 2 2

measd measd network hybrid hybrid z/m ∆z/m ∆zˆ /m zˆ /m ∆zˆ /m 2.74 2.60 2.89 2.92 2.86 2.75 2.65 2.91 2.94 2.83

0.309 0.052 0.402 0.336 0.378 0.262 0.052 0.424 0.382 0.358

0.283 0.034 0.415 0.348 0.388 0.277 0.043 0.418 0.344 0.387

2.74 2.67 2.82 2.74 2.74 2.74 2.66 2.82 2.74 2.74

0.279 0.047 0.419 0.316 0.385 0.282 0.046 0.424 0.314 0.385

charging sequences, but the order and size of the dumps and the movable armor actions varied between, and sometimes also within, the periods. One of the charging programs has been presented in Table 1, where the first three columns report the dump number, the materials (C, coke; S, sinter; P, pellets) and the movable armor settings (MA), respectively. The armor settings applied vary between 2 and 10, where MA < 4 do not interfere with the falling parabola of the dumps, while MA g 5 affect the trajectory of all dumps by pushing them toward the furnace center (cf. Figure 1). In the earlier study with fully connected but small neural networks,32 an extensive preprocessed data set from a single period of consecutive dumps with a constant charging program was studied. Because the present study applies a training method that was explicitly devised to select an appropriate model complexity, shorter and “unfiltered” data segments from the different periods were deemed sufficient. As inputs, a set of variables potentially influencing the burden distribution was selected (cf. Figure 1), including the distance to the stock line, z, the radial position, r, of the point of intersection of the falling trajectory and the burden surface, the dump volume, V, and mass, m, the movable armor position, MA, and the angle of repose of the burden material, R. Also, lagged values of these variables (i.e., corresponding values for the previous dump) were included. The networks’ task was to predict the burden layer thickness, i.e., fk ) ∆zk and ˆfk ) ∆zˆ k in eq 1. Before the problem was tackled, the input and output variables were scaled to (-1, +1) by the transformation y˜ i ) (yi - yi,min)/(yi,max - yi,min), where sub-

Ind. Eng. Chem. Res., Vol. 42, No. 11, 2003 2317 Table 2. Main Characteristics of the Charging Programs of the Three Periods period 1

period 2

period 3

dump

material

MA

∆z/m

material

MA

∆z/m

material

MA

∆z/m

1 2 3 4 5 6 7 8 9 10

S P C S+P C S P C S+P C

4 10 2 6 2 7 10 2 5 2

0.329 0.116 0.415 0.217 0.371 0.224 0.147 0.383 0.259 0.420

S C+P C S+P C S C+P C S+P C

2 10 7 2 2 2 10 7 2 2

0.309 0.052 0.402 0.336 0.378 0.262 0.052 0.424 0.382 0.358

S P C S+P C S P C S+P C

2 10 2 2 2 2 10 2 2 7

0.258 0.105 0.413 0.404 0.449 0.295 0.132 0.394 0.413 0.461

Table 3. Mean Values of the Variables for the 300 Dumps of the Later Part of the Training Set (Period 2)a dump

material

MA

m/tons

V/m3

z/m

r/m

∆z/m

1 2 3 4 5 6 7 8 9 10

S C+P C S+P C S C+P C S+P C

2 10 7 2 2 2 10 7 2 2

20.4 7.75 5.52 27.3 6.78 20.8 7.72 5.62 27.3 6.78

11.3 5.73 11.4 14.1 14.0 11.5 5.69 11.6 14.2 14.0

2.92 2.81 3.05 3.04 2.95 2.83 2.76 3.11 3.06 2.97

3.12 2.31 2.52 3.09 3.12 3.12 2.31 2.51 3.09 3.13

0.297 0.058 0.395 0.342 0.397 0.258 0.044 0.418 0.345 0.356

a The variables are the movable armor position (MA), the mean values of the dump mass (m) and volume (V), the distance to the stock line (z), the impact point of the falling trajectory (r), and the observed layer thickness (∆z).

scripts min and max refer to the minimum and maximum values of the variables in the training set. The training set included 300 observations from each of the first two periods, while the test set included 300 observations from the third period. The main features of the three periods have been reported in Table 2. For the purpose of illustration, more detailed information about the second period is given in Table 3 (also see Table 1) that reports mean values of the central variables of the dumps. Data for dumps at very low stock levels were here excluded to avoid bias in the mean values. In this charging program, the layer thicknesses are seen to be quite equal, varying between 0.25 and 0.40 m, except for dumps 2 and 7, where the high armor setting in combination with the heavier pellets charged on top of the coke push the coke toward the furnace center and yield layers of about 0.05 m. 3.2. Neural Networks Evolved. The layer thickness prediction problem was tackled by the training method outlined in subsection 2.3, using a population size of 100, evolved for 200-300 generations with an elitism percentage of 40% and a mutation probability of 2%. The standard deviation of the initial weights was set to σ ) 3, and a maximum of 10 neurons in a single hidden layer of the network was allowed. With these parameters, a run of the evolutionary algorithm required about 1 h of computation time on a personal computer (1 GHz). Figure 4 shows the average training (×) and test (*) errors for the networks evolved versus the number of parameters. The training error, tr, decreases quite slowly and monotonically with increasing network complexity, while the test set error, te, shows larger scattering. However, if the lower boundary of te is studied, indicated by the dashed curve in the figure, one may conclude that a complexity between 15 and 30 parameters seems most appropriate; above this point the networks start to overfit the data. The best network models yielded training errors of tr ≈ 0.12 m and test errors of  te ≈ 0.14 m, so the results are slightly better

Figure 4. Average training (×) and test (*) errors of the evolved networks with respect to the number of parameters of the networks. The dashed curve estimates the lower boundary of the test error, while the circles denote the performance of the networks of Figure 5.

than those reported by Hinnela¨ and Saxe´n.32 For the sake of comparison, a linear model based on the potential inputs was determined, yielding errors of  tr ) 0.145 m and  te ) 0.172 m, again in agreement with the earlier study. However, considering the fact that the present study is based on observations from periods of radically different charging programs and conditions, the results of the nonlinear modeling are clearly better than those in the previous study, and the resulting models are much more general. The results also show that the choice of the complexity penalty in eq 1 is not especially crucial. Even though some of the networks evolved correspond to less general solutions (i.e., high te), most of the resulting models exhibit quite similar performance and have captured the overall behavior of the layer thickness. Figure 5 illustrates four promising networks evolved with different values of the penalty factor, q, and the corresponding average prediction errors. The performance indices of these networks have also been indicated in Figure 4 by circles. In summary, it was concluded from the networks evolved that the input variables required to predict the layer thickness accurately often include the distance to the stock line, zk, information about the effect of the armor setting (MAk or rk) and variables characterizing the “amount” and type of the dumped material (Vk or mk and Rk), and, in practically all models, the thickness of the previous burden layer, ∆zk-1. The last variable reflects the fact that a thin layer is followed by a thick layer, and vice versa. For high values of the complexity-penalty factor, q, the simple networks that evolved used Rk, ∆zk-1, and Vk or mk as their only inputs (cf. Figure 5d). In conclusion, the input variables selected by the method

2318

Ind. Eng. Chem. Res., Vol. 42, No. 11, 2003

Figure 7. Simplified flowsheet of the hybrid model, where the neural network (NN) is used to estimate the burden layer thickness. Figure 5. Some promising networks of different complexity evolved at penalty factors, q (cf. eq 1), reported above each figure. Mean prediction errors are reported below each network.

Figure 6. Layer thicknesses for a 40-dump segment of the test set: observed (O) and predicted by the networks illustrated in Figure 5b (s) and Figure 5c (- -).

are sound and in general agreement with the results from earlier studies and from small-scale model experiments. For the purpose of illustration, consider the network of Figure 5b. An indication of the accuracy of the model is given in column 6 of Table 1, where the network has predicted the layer thickness on the basis of the mean values of the inputs reported in Table 3. These values agree very well with the observed (average) values of ∆zk in column 5 of Table 1 with an average error of only 2 cm. Thus, the model has clearly captured the main dynamics of the layer-formation process. A qualitative comparison between this model and the slightly less complex model of Figure 5c is provided in Figure 6 that shows observed (circles) and predicted (solid and dashed lines) layer thicknesses for a 40-dump section of the test set. The difference in the prediction accuracy of the two models is seen to be small. To illustrate how the action of a sparse network can be analyzed, the behavior of the latter network model is further studied in the Appendix.

4. Hybrid Model of Burden Distribution 4.1. Background. Several of the networks evolved include the thickness of the previous burden layer, ∆zk-1, and the distance to the stock line, zk, as inputs. In practice, only a set-point value, zset, is given for the stock level in the charging program, where z > zset signals that the next dump should be charged (cf. Figure 1). However, the burden material of the next dump may not yet be ready on top of the bell at this moment because of the time required for transportation of the material to the furnace top. This is often the case after a dump that has formed a thin layer at the wall; in such cases the stock level clearly passes the set point before the next dump is charged (cf. Table 1). Thus, in offline predictions of the performance of novel charging programs, the previous layer thickness and the present stock level are unknown. The former problem can obviously be solved by feeding the network’s own predictions as input for the next dump. The latter one, in turn, can be tackled by setting up a differential equation for the descent of the stock, i.e., for the (more) deterministic part of the problem, combined with the neural model for the intermittent charging of burden layers.32 Such a hybrid model, a simplified flowsheet of which is presented in Figure 7, can also consider practical constraints of the charging process, e.g., the minimum time between the dumps, ∆td. For the furnace studied in this work, this variable was ∆td ≈ 110 s and the stock level set point was zset ) 2.50-2.70 m, while the average burden descent rate was u j s ) 0.11 m/min. 4.2. Illustration Examples. The hybrid model built on the network illustrated in Figure 5b was applied to predict the evolution of the stock level for the charging program of Table 3, using a time step of ∆t ) 1 s and a level set point of zset ) 2.70 m for all dumps except dumps 2 and 7, i.e., the center-charged dumps, for which zset ) 2.50 m was applied to (partly) avoid the loss of stock that follows after thin burden layers. Because the outputs and inputs are dependent through z, the model was evolved until it produced periodic patterns, i.e., a quasi-steady state. Figure 8 shows the observed (O) and simulated (- -) layer thicknesses for the 10 dumps of the program. The simulated thicknesses, also reported as column 8 in Table 1, are seen to agree well with the

Ind. Eng. Chem. Res., Vol. 42, No. 11, 2003 2319

Figure 8. Evolution of the burden layer thickness for the 10dump sequence reported in Table 3. Observations are marked by circles and predictions by the hybrid model by dashed lines (cf. Table 1).

Figure 9. Predictions (solid lines) by the hybrid model of the burden layer thickness for an increase in the mass of the first dump (sinter) from 20.4 to 30.4 tons (upper panel) and an increase in the movable armor position of the fifth dump (coke) from MA ) 2 to 10 (lower panel). Results of the reference case (cf. Figure 8) are shown by dashed lines.

average observed values for the period (column 5). As for the simulated stock level, the raised set point for the second and seventh dumps partly cancels the drop of stock after the thin dumps, but the predicted z shows less variation than the observed signal (columns 4 and 7 in Table 1). Despite these differences, the hybrid model has been successful in predicting the general behavior of the charging process, and these results will in what follows be used as a reference case. The model was first applied to study the effect of the ore dump size by changing the mass and volume of the first dump (cf. Table 3) from 20.4 tons and 11.3 m3 to 30.4 tons and 16.8 m3, respectively. In agreement with the findings of the sparser network analyzed in the Appendix, the local effect of the change is an 11 cm increase in the layer thickness of this dump, as depicted in the upper panel of Figure 9 by solid lines. For the purpose of comparison, the figure also shows the results of the reference case by dashed lines. The increased layer thickness of the first dump is seen to slightly decrease the layer thickness of the second dump. Next, the action of the movable armor is studied by changing the armor notch of the fifth dump (coke) from 2 to 10, keeping all other variables as in the reference case. The local effect shown in the lower panel of Figure 9 is a decrease of 18 cm in the layer thickness of the dump,

Figure 10. Evolution of the stock level at a slip (upper panel) and a corresponding simulation by the hybrid model (lower panel).

Figure 11. Predictions by the hybrid model of the burden descent rate (top panel) and the stock level (middle panel) for a stochastic and drifting descent rate. The bottom panel shows a more detailed view of the evolution of the stock level (cf. Figure 3).

in agreement with the results of the Appendix. The change also brings about an increase in the layer thickness of the sixth dump (sinter). In period 2, a slip, i.e., a sudden drop in the stock level, occurred between dumps 418 and 419. To analyze the effect of a slip by the hybrid model, a 2.5 m drop of the stock was introduced at a point where the model was in a quasi steady state at a slightly lower burden descent rate, u j s ) 0.09 m/min; Figure 10 illustrates the observed and simulated behavior of the stock level. Despite the fact that it predicts a smaller level variation around the set point, the hybrid model has captured the overall behavior well with a recovery of the stock in about 25 dumps. By contrast to earlier findings,32 the present model does not predict a (strong) relation between the stock level and the layer thickness. Finally, the effect of a drifting stock level was studied by using the following formulas for the burden descent

us(t+1) ) u j s + x(t+1) x(t+1) ) x(t) + ξ(t+1) z(t+1) ) z(t) + us(t) + ω(t+1)

(3)

where u j s is the mean descent rate and ω and ξ are white noise sequences. When t is expressed in seconds with ω ∼ N(0,10-2m) and ξ ∼ N(0,5×10-7m2/min2), the

2320

Ind. Eng. Chem. Res., Vol. 42, No. 11, 2003

5. Conclusions

Figure 12. Detailed view of the network evolved with q ) 0.086 (cf. Figure 5c). Weights have been indicated close to the corresponding connections and biases in the nodes.

results of Figure 11 were obtained; the top panel shows the evolution of us, while the middle panel shows z during a period of 10 h. The increase in the descent rate at t ≈ 4 h is seen to bring about a loss of stock because of an insufficient charging frequency. The bottom panel, which gives a more detailed illustration of the simulated stock level, shows an appearance in close agreement with the observed radar signal (cf. Figure 3).

A model has been developed for studying the formation of burden layers in the iron-making blast furnace under different charging programs and conditions. Local burden layer thickness estimates were obtained from radar measurements of the burden level in the furnace, and the dependence between the layer thickness and key variables from several different charging programs was modeled by neural networks. To tackle the problem of finding a suitable network size and architecture, an evolutionary algorithm combined with a least-squares solution was applied for training the network weights and connections simultaneously on data sets from a Finnish blast furnace. Through a study of the behavior of the networks on an independent data set, it was found that the method evolved networks with good generalization properties. The sparsity of some of the networks evolved made it possible to analyze the function of the nonlinear models by simple reasoning and numerically in reduced dimensions. In an analysis of novel charging programs, the evolved networks were used in an autonomous hybrid model that considers also practical constraints of the charging process. The hybrid model, which was found to accurately mimic the charging process, was used to evaluate the impact of altered boundary conditions in charging programs. In the forthcoming work, the model will be extended to recursive estimation of the upper-layer parameters of the

Figure 13. Contributions from the hidden nodes in the network of Figure 12 to the layer thickness, expressing the inputs and output as unscaled quantities.

Ind. Eng. Chem. Res., Vol. 42, No. 11, 2003 2321

network to make online tuning possible. Furthermore, the model will be used to estimate the radial distribution of the burden layers by linking the local layer thickness it predicts with constraints for the volume and angle of repose of the charged materials.33 This will make it possible to design and evaluate different schemes for burden distribution control. Appendix In this Appendix, the network presented in Figure 12 is studied in more detail. Because of its sparse connectivity, it is possible to interpret part of the networks’ action by simple reasoning. A numerical analysis is also provided to explain the function of this network more quantitatively. The network has four inputs, a single layer of sigmoid nodes, each nonlinearly transforming a weighed sum, a, of its inputs to a nodal output, x ) s(a) ) 1/[1 + exp(-a)]. In the analysis that follows, one should keep in mind that the external inputs and outputs are scaled to (-1, +1). In the illustrations, the contributions from the hidden nodes to the layer thickness have been calculated by keeping the (scaled) inputs to all other nodes at zero. As a reference for the study, a linear model with identical (but unscaled) inputs was entertained, yielding the approximation

∆zˆ k (m) ) 0.524 + 0.0297[Vk (m3)] - 0.0112MAk 0.00615[mk (tons)] - 0.0132[Rk (deg)] 0.230[∆zk-1 (m)] (A1) Starting the analysis of the network model, from the left, the first two hidden nodes are connected to single input variables: the first to the volume (Vk) of the dump and the second to the movable armor setting (MAk). Because of positive input and output weights (0.213 and 6.57) of the first hidden node, there is a positive correlation between Vk and ∆zˆ k, and the small input weight in combination with the small bias makes the node operate in the linear region (Figure 13a). The second node has a larger positive input weight (1.51), a small bias (-0.137), and a negative output weight (-2.82), so the effect of MAk on ∆zˆ k is negative, slightly nonlinear, but antisymmetric (Figure 13b). The third and fourth hidden nodes are both connected to two inputs. The former node has a small bias (-0.277), so its action is also antisymmetric. Of the inputs, MAk plays a more central role because its weight (-2.47) is much larger in relation to its absolute value than the weight of mk (0.854), and being negative (with a negative output weight, -1.78), the effect of MAk is opposite to (and more nonlinear than) that of hidden node number two. The mass mk acts in the opposite direction to what was observed for the volume in node number one and quite linearly (Figure 13c). Finally, the last hidden node has a small (positive) output weight, so its effect is not decisive in the network. With a large negative bias (-1.70) and larger negative input weights (-2.39 and -3.14), the node merely acts as a switch, where an output of zero dominates. Both the repose angle and the layer thickness of the previous dump reduce the present layer thickness. The former seems to be in conflict with process knowledge, but it should be noted that Rk takes on a few discrete values only, so this input may act as a codebook for making minor adjustments of the level of the output signal, which is set mainly by the other

Figure 14. Overall effect of the movable armor setting (MA) and the dump size (expressed in terms of mass, m) on the burden layer thickness for the network in Figure 12 for (a) coke and (b) sinter.

nodes. The negative correlation between ∆zk-1 and ∆zˆ k, in turn, captures the well-known alternating character of the layer thickness (Figure 13d). In summary, it is interesting to note that the above findings agree qualitatively with the results of the linear model in eq A1 in terms of the signs of the dependences, but the magnitudes of the predicted relations still differ. The analysis presented above was for the actions of the separate hidden nodes, but because some inputs are connected to two nodes, the overall picture of the relation between the inputs and output is more complicated. Moreover, some of the inputs are not independent; the mass and volume of the dump are related by mi ) FiVi, where Fi is the density of the ith material. Considering this fact, the effect of the first three hidden nodes can be lumped and depicted for a two-dimensional input space for the different burden materials. The two panels of Figure 14, illustrating the relation between the movable armor setting and the mass of the dump for coke and sinter, clearly show that the mass has an increasing effect and the movable armor setting has an overall decreasing effect on the burden layer thickness. (The relation for pellets has not been depicted because it is almost identical to that of sinter.) For instance, a 1 ton increase of the mass of a coke dump is seen to increase the layer thickness by 3.3 cm. Note that at lower values of mk the effect of the movable armor is first positive (i.e., increases the layer thickness) but changes (around MAk ) 6) into negative, which is in agreement with what can be concluded from the sche-

2322

Ind. Eng. Chem. Res., Vol. 42, No. 11, 2003

Figure 15. Overall effect of the movable armor setting (MA) and the dump size (expressed in terms of mass, m) on the burden layer thickness for the linear model of eq A1 for (a) coke and (b) sinter.

matic of Figure 1. Also, observe that the mass of the coke dump, in practice, varies much less than the mass of sinter, so the region of interest in Figure 14a is mk < 10 tons. Finally, a corresponding analysis is made to compare the findings with the results of the linear model in eq A1. After the effects of the dump volume and mass are lumped and the mean value of the repose angle and previous layer thickness is inserted, the model becomes

∆zˆ k (m) ) 0.0994 + [0.0297/Fk (m3/tons) 0.00615]mk (tons) - 0.0112MA (A2) The relations have been illustrated for coke and sinter in Figure 15, where the different scale on the layerthickness axis in the former case should be noted. Considering the ranges spanned by the mass of the dumps in the data sets, the two models are seen to agree relatively well. However, the lack of nonlinearlity in the latter model prevents it from capturing, e.g., the more detailed effect of the movable armor. Literature Cited (1) Omori, Y., Ed. (The Iron and Steel Institute of Japan). Blast Furnace Phenomena and Modelling; Elsevier: London, 1987. (2) Poveromo, J. J. Blast Furnace Burden Distribution Fundamentals. Iron Steelmaker 1995-1996, 22-23, No. 5/95-3/96.

(3) Iizuka, M.; Kajikawa, S.; Nakatani, G.; Wakimoto, K.; Matsumura, K. Development and application of measuring equipment for burden distribution in the blast furnace. Nippon Kokan Tech. Rep. Overseas 1980, 30, 13. (4) Iwamura, T.; Sakimura, H.; Maki, Y.; Kawai, T.; Asano, Y. Sensor and Signal Quantification for Blast Furnace Gas Distribution Control. Trans. Iron Steel Inst. Jpn. 1982, 22, 764. (5) Kajiwara, Y.; Inada, T.; Tanaka, T.; Jimbo, T. Analysis and Control of Burden Distribution at the Blast Furnace Top. Sumitomo Search 1990, 44, 249. (6) Nicolle, R.; Thirion, C.; Pochopien, P.; Le Scour, M. Gas Distribution Control in the Shaft. A French Experience. Proceedings of the 46th Ironmaking Conference; ISS: Warrendale, PA, 1987; p 217. (7) Itaya, H.; Aratani, F.; Kani, A.; Kiyohara, S. A simplified mathematical model for the estimation of 2-dimensional temperature distribution in the blast furnace. Rev. Metall./Cah. Inf. Tech. 1982, 79, 443. (8) Nikus, M. A set of models for on-line estimation of burden and gas distribution in the blast furnace. Doctoral Dissertation, Heat Engineering Laboratory, Åbo Akademi University, Åbo, Finland, 2001. (9) Pochwisvew, A. N. Der Hochofenbetrieb; Veb Verlag Technik: Berlin, 1954. (10) Warren, P. Recent developments in blast furnace process control within British Steel. Proceedings of the 54th Ironmaking Conference; ISS: Warrendale, PA, 1987; p 281. (11) Haykin, S. Neural NetworkssA Comprehensive Foundation; Macmillan Publishing Co.: New York, 1994. (12) Battiti, R. First- and Second-Order Methods for Learning: Between Steepest Descent and Newton’s Method. Neural Comput. 1992, 4, 141. (13) Le Chun, Y.; Denker, J. S.; Solla, S. A. Optimal Brain Damage. In Advances in Neural Information Processing Systems; Touretzky, D. S., Ed.; Morgan Kaufmann: San Mateo, 1990; Vol. 2, p 598. (14) Fogel, D. B. An Information Criterion for Optimal Neural Network Selection. IEEE Trans. Neural Networks 1991, 2, 490. (15) Engelbrecht, A. P. A new pruning heuristic based on variance analysis of sensitivity information. IEEE Trans. Neural Networks 2001, 12, 1386. (16) Duran, M. A.; Grossmann, I. E. An Outer-Approximation Algorithm for a Class of Mixed-Integer Nonlinear Programs. Math. Program. 1986, 6, 307. (17) Ryoo, H. S.; Sahinidis, N. V. Global Optimization of Nonconvex NLPs and MILPs with Applications in Process Design. Comput. Chem. Eng. 1995, 19, 551. (18) Fogel, D. B.; Fogel, L. J.; Porto, V. W. Evolving Neural Networks. Biol. Cybern. 1990, 63, 487. (19) Maniezzo, V. Genetic Evolution of the Topology and Weight Distribution of Neural Networks. IEEE Trans. Neural Networks 1994, 5, 39. (20) Reeves, C.; Steele, N. Applications of Genetic Algorithms in Artificial Neural Networks. Syst. Sci. 1993, 19, 63. (21) Angeline, P. J.; Saunders, G. M.; Pollack, J. B. An Evolutionary Algorithm That Constructs Recurrent Neural Networks. IEEE Trans. Neural Networks 1993, 5, 54. (22) Gao, F.; Li, M.; Wang, F.; Wang, B.; Yue, P. Genetic Algorithms and Evolutionary Programming Hybrid Strategy for Structure and Weight Learning for Multilayer Feedforward Neural Networks. Ind. Eng. Chem. Res. 1999, 38, 4330. (23) Schmitz, G. P. J. Combinatorial evolution of feedforward neural network models for chemical processes. Ph.D. Dissertation, University of Stellenbosch, Stellenbosch, Republic of South Africa, 1999. (24) Yen, G.; Lu, H. Hierarchical genetic algorithm for nearoptimal feedforward neural network design. Int. J. Neural Syst. 2002, 12, 69. (25) O ¨ stermark, R. A flexible Genetic Hybrid Algorithm for nonlinear mixed-integer programming problems. Evol. Opt. 2002, 1, 41. (26) Pettersson, F.; Saxe´n, H. A hybrid genetic algorithm for parametric and structural learning in layered neural networks. 2003, submitted for publication. (27) Holland, J. H. Adaptation in Natural and Artificial Systems; The University of Michigan Press: Ann Arbor, MI, 1975. (28) Goldberg, D. E. Genetic Algorithms in Search, Optimization and Machine Learning; Addison-Wesley: Reading, MA, 1989.

Ind. Eng. Chem. Res., Vol. 42, No. 11, 2003 2323 (29) Michalewicz, Genetic Algorithms + Data Structures ) Evolution Programs; AI Series; Springer-Verlag: New York, 1994. (30) Wehrens, R.; Pretch, E.; Buydens, L. M. C. Quality Criteria of Genetic Algorithms for Structure Optimization. J. Chem. Inf. Comput. Sci. 1998, 38, 151. (31) Golub, G. Numerical methods for solving linear least squares problems. Numer. Math. 1965, 7, 206. (32) Hinnela¨, J.; Saxe´n, H. Neural Network Model of Burden Layer Formation Dynamics in the Blast Furnace. Iron Steel Inst. Jpn. Int. 2001, 41, 142.

(33) Saxe´n, H.; Hinnela¨, J. Model for Burden Distribution Tracking in the Blast Furnace. Miner. Process. Extract. Metall. Rev. 2003, in press.

Received for review May 20, 2002 Revised manuscript received March 4, 2003 Accepted March 26, 2003 IE0203779