Data-Derived Analysis and Inference for an ... - ACS Publications

This paper presents an application of data-derived approaches for analyzing and monitoring industrial processes. The discussed methods are used in ...
0 downloads 0 Views 4MB Size
Article pubs.acs.org/IECR

Data-Derived Analysis and Inference for an Industrial Deethanizer Francesco Corona,*,† Michela Mulas,‡ Roberto Baratti,§ and Jose A. Romagnoli⊥ †

Department of Information and Computer Science and ‡Department of Civil and Environmental Engineering, Aalto University, Finland § Dipartimento di Ingegneria Meccanica, Chimica e dei Materiali, University of Cagliari, Italy ⊥ Department of Chemical Engineering, Louisiana State University, Baton Rouge, Louisiana, United States ABSTRACT: This paper presents an application of data-derived approaches for analyzing and monitoring industrial processes. The discussed methods are used in visualizing process measurements, extracting operational information, and designing estimation models for primary process variables otherwise difficult to measure in real-time. Emphasis is given to the modeling of the data with two classical machine learning paradigms; the self-organizing map (SOM) and the multi-layer perceptron (MLP). The effectiveness of the proposed approach is validated on an industrial deethanizer, where the goal is to identify operational modes and most sensitive variables for this full-scale unit, as well as design an inferential model for a critical process variable, the bottom ethane concentration. The study led to the definition of a fully automated monitoring tool to be implemented online in the plant’s distributed control system. The results confirmed the potential of the data-derived approach, and based on the analysis, the existing control configuration of the unit could be redefined toward more consistent operations. Because it is general and modular by design, the tool can be easily used for other processes.



INTRODUCTION

information encoded in industrial process data. The approach is based on two classical machine learning methods, the selforganizing map (SOM1) and the multi-layer perceptron (MLP2), widely used in process systems engineering.3 The main contributions in the application of the self-organizing map to industrial process data were developed by Alhoniemi4 and Laine,5 whereas the use of the multi-layer perceptron dates back to the seminal works of Hoskins and Himmelblau6 and Qin and McAvoy.7 In this work, we discuss their application in the analysis of a full-scale industrial deethanizer, where the goal is to monitor online the column in terms of its primary variables and to identify significant operational modes before proposing alternative control strategies. The selected application arises from specific industrial needs with direct economic and process implications, while retaining a variety of behaviors that allows a thorough and yet simple presentation of the methodology. The SOM was previously used as a framework for identifying the operational modes of the deethanizer under study and to present the extracted process information on intelligible displays.8 The present study extends and complements that analysis by including a MLP as a regression model for estimating a hard-to-measure primary variable (the ethane concentration at the column bottom), starting from a set of easy-to-measure secondary variables. The study leads to the design of a fully automated monitoring tool to be implemented online in the plant’s distributed control system. The resulting tool is general and modular by design, and thus, it can be easily applied to other processes with higher complexity and dimensionality.

Modern process industries are evolving toward an efficient and safe operation of production plants, optimization of operating and management costs, and production of high-quality products. A prime requirement for achieving these objectives relies on efficient monitoring tools for supervising the processes, as well as assisting the design of advanced control strategies. The availability of easily accessible displays and intuitive knowledge of the processes, together with the availability of real-time measurements of important primary variables is thus invaluable, with immediate implications for profitability, management planning, environmental responsibility, and safety. The primary variables are, typically, compositions and conversions or other indicators related to product quality, process performance, and economic interest. The conventional strategy to approach the monitoring problem relies upon field instrumentation, the traditional hardware-sensors and off-line laboratory analysis. Such systems are often associated with expensive capital and maintenance costs, and they are characterized by time-delayed responses unacceptable for realtime monitoring and control. Fortunately, the availability of modern automation systems enables us to systematically acquire a large number of plant signals that would allow an accurate supervision of the processes in terms of secondary variables easily measurable online. Typically, the secondary variables are pressure, temperature, and flow-rate measurements that may be used for characterizing the conditions of the processing units. Being the primary variables necessarily related to some of the secondary variables, the availability of vast quantities of such data offers the opportunity to develop dataderived process models that can reconstruct such a relationship and extract information from it. This paper discusses the development and direct application of a methodology to extract, visualize, and model the © 2012 American Chemical Society

Received: Revised: Accepted: Published: 13732

December 6, 2011 September 5, 2012 September 5, 2012 September 5, 2012 dx.doi.org/10.1021/ie202854b | Ind. Eng. Chem. Res. 2012, 51, 13732−13742

Industrial & Engineering Chemistry Research



Article

{mk}Kk=1 are adaptively updated to represent similar observations in {vi}Ni=1. The resulting model learns the data manifold in the original space in such a way that the relevant topological properties of the observations are preserved on the projection space. Thus, the SOM is to be understood as an ordered image of the original high-dimensional data modeled onto a downsampled low-dimensional space. SOM-Based Methods for Data Exploration. In the typical case of projections onto 2D maps, the SOM offers excellent techniques for data exploration. In that sense, the approach to data analysis with the SOM is mainly visual and it is based on bidimensional displays specifically designed for this task. The data visualization techniques based on the SOM assume that the prototype vectors are representative models for groups of similar observations, and projecting the data onto the bidimensional array allows for an efficient display of the dominant relationships existing between them. For instance, the displays permit to identify the shape of the data distribution, cluster borders, and projection directions. The visualization techniques considered here are (i) the component planes and (ii) the distance matrices, originally designed by Kaski,9 Himberg,10 and Vesanto:11 • A component plane shows on the SOM the coordinates of the prototype vectors along a specific direction in the data space; that is, each component plane is associated with one original variable. The component planes are used to display and analyze the variables’ distributions. • A distance matrix visualizes on the SOM the average distance between each prototype vector and its adjacent neighbors. A widely used distance matrix for the SOM is the unified distance matrix, or U-matrix.12 The U-matrix is used to analyze the data clustering structure. The SOM is also used to start a preliminary identification of the relationships between variables. From the map, dependencies are searched by looking for similar patterns in identical positions in the component planes. The process is aided by ordering the planes in such a way that planes corresponding to related variables are positioned near each other.11 Measure of Topological Relevance on the SOM. The topology preserving properties of the SOM can be used for the identification of a subset of secondary variables {xi}Ni=1 that is relevant for estimating a primary variable {yi}Ni=1. The relevance of the secondary variables can be in fact quantified starting from the assumed continuity of the unknown functionality y = f(x) to be reconstructed. The measure of topological relevance on the self-organizing map, MTR on SOM, 13,14 assesses the significance of an variable input x(j) to the output y by calculating a distance between their respective local topologies; that is, ; (x(j),y) = ∥Ux(j) − Uy∥F. The topology of the component variables is recovered from the corresponding Umatrices calculated independently for each direction of the data space; that is, Ux(j) (with j = 1, ..., d) for the input variables, and Uy for the output. The calculation of the component Umatrices is identical to the calculation of the full U-matrix, except for the set of prototypes’ dimensions considered for measuring distances. The Frobenius metric ∥·∥F measures the Euclidean closeness between two matrices and, hence, their similarity; the closer to 0 the measure, the more relevant the input for reconstructing the output. The metric does not make any assumptions on the input−output functionality, and it is thus capable to detect both linear and nonlinear dependencies. To clearly represent relevances the way they are perceived, the

METHODS The data-derived approach to model industrial processes seeks to construct a representation of a system from a set of measurements that quantify its behavior, without any explicit knowledge of the underlying phenomena. The modeling task is daunting and remains a major concern because of the inherent characteristics of the data; redundancy, possible insignificance, and disturbances. It is thus necessary to have the availability of efficient and robust methods (i) to extract and display the information existing in the observations and (ii) to reconstruct the relationships existing between the variables. This section briefly overviews the two machine learning methods considered here for the purpose: self-organizing map and the multi-layer perceptron. In the following, it is assumed the availability of a set of N process measurements (or observations) vi relative to p process variables; that is, vi ∈ p. If there are d secondary and s primary variables, then {vi}Ni=1 = {(xi,yi)}Ni=1 with xi ∈ d, yi ∈ s and p = d + s.



SELF-ORGANIZING MAP The self-organizing map, SOM,1 is an adaptive algorithm that performs two tasks: a reduction of the data dimensionality by projection and a reduction of the amount of observations by clustering. The SOM nonlinearly projects vast quantities of high-dimensional data onto a low-dimensional array of few prototypes in a fashion that aims at preserving the topology of the observations. By choosing a bidimensional array of prototypes, the main advantage of the map is the wealth of SOM-based data visualization tools. SOM Algorithm and Its Properties. The basic selforganizing map consists of a low-dimensional and regular array of K nodes, where a prototype vector mk ∈ p is associated with each node k. Each prototype acts as an adaptive model vector for the N observations vi ∈ p. During the learning of the map, the observations are mapped onto the array of nodes and the prototypes adapted according to mk(t + i) = mk(t ) + α(t )hk , c(vi)(vi(t ) − mk(t ))

(1)

In the learning rule in eq 1, t denotes the discrete-time coordinate of the mapping steps and α(t)∈(0,1) is a monotonically decreasing learning rate. The scalar multiplier hk,c(vi) denotes a neighborhood kernel function centered at the Best Matching Unit (BMU). The BMU is the model vector mc(t) that, at time t, best matches with the observation vector vi. The matching is based on a competitive criterion on the Euclidean metric δ(mk(t),vi(t)), for all the k nodes. At each step t, the BMU is thus the mk(t) that is closest to vi(t): That is, c(t ) = arg min(δ(mk(t ), vi(t ))2 ), ∀k and ∀i k

(2)

A Gaussian function centered at mc(t) is often chosen for the kernel: ⎛ ||r − r ||2 ⎞ hk , c(vi) = exp⎜ − k 2 c ⎟ 2σ (t ) ⎠ ⎝

(3)

where the vectors rk and rc represent the geometric location of the nodes on the array and σ(t) denotes the monotonically decreasing width of the kernel. The SOM is computed recursively for each observation. As α(t)hk,c(vi) tends to zero with t, the set of prototype vectors 13733

dx.doi.org/10.1021/ie202854b | Ind. Eng. Chem. Res. 2012, 51, 13732−13742

Industrial & Engineering Chemistry Research

Article

Figure 1. Deethanizer flowsheet, including part of the instrumentation.

measure ; (·,·) ≥ 0 is preferably inverted and rescaled so that, larger values indicate stronger relevances (e.g., ; (·,·) → ; (·,·)∈[0,1]. Variable selection is then simply performed by retaining only the highest ranking inputs, although, given the generality of the relevance criterion, any other subset selection methods could be used.15

φ (l k H ) =

MULTI-LAYER PERCEPTRON The multi-layer perceptron, MLP,2 is an universal function approximator consisting of a number of processing units placed in layers. Each unit receives, transforms, and distributes information about the unknown input−output functionality. For a data set {(xi,yi)}Ni=1, MLPs with one hidden layer estimate the output as a linear combination of the transformed inputs,



APPLICATION The potential of the methods overviewed in the previous section is illustrated on a set of measurements from a full-scale industrial process. The problem consists of modeling and analyzing the operational behavior of an industrial gas fractionation plant, starting from process measurements acquired in real-time. The plant consists of three sequential separation units (a deethanizer, a depropanizer, and a debutanizer) producing propane, butane, and gasoline. For compactness, the presentation is here restricted to the deethanizer, Figure 1. The deethanizer separates ethane from a feed stream of light naphtha. The operational objective of the column is to produce as much ethane as possible; that is, operations should minimize propane’s concentration from the top while satisfying a constraint on the amount of ethane from the bottom. Such a constraint is quantified by the maximum amount of ethane lost from the column bottom; the operation threshold is set to be smaller than 2.0%. The ethane loss can be used to define three operational modes of the deethanizer: • a normal status (operations of the column where the concentration of ethane is within allowable bounds (the 1.8−2.0% range))

KH



wkHφ(lkH) (4)

kH = 0

where, the input lkH to the kHth hidden unit (kH = 1, ..., KH) is given as KL

l kH =

∑ w(k kL = 0

H , kL )

(6)

The connection weights, w(kH,kL) and wkH, parametrize the model and they are typically learned by minimizing a prediction error. In the case with a fixed number of hidden layers and a transfer function with fixed slope, the MLP model has only one hyper-parameter to be defined; that is, the number of hidden units, which can be selected using standard resampling methods.



yi ≈ f (x i) ≈

1 1 + exp( −αlkH)

X(i kL) (5)

In eqs 4 and 5, w(kH,kL) and wkH denote the weights associated to the input and the output layer, respectively. The operator φ(·) transforms the linear combination of the inputs performed in input layer with kL = 1, ..., KL units (eq 5) and the resulting linear combination of transformed inputs to the hidden layer estimates the output (eq 4). Denoting with KL = 0 and KH = 0, the bias units for the input and output layers, X(0) i = 1 and φ(l0) = 1 indicate the corresponding constant bias signals, respectively. Nonlinearity is introduced using strictly increasing smooth operators φ(·) that also exhibit a regular balance between linear and nonlinear behavior. A commonly used transfer function is the logistic function of slope α: 13734

dx.doi.org/10.1021/ie202854b | Ind. Eng. Chem. Res. 2012, 51, 13732−13742

Industrial & Engineering Chemistry Research

Article

is controlled by the distillate flow-rate FI-1, and the level in the reboiler LIC-1, by the bottom flow-rate FIC-5. The original set of process variables has been expanded by incorporating an additional indicator specifically calculated to represent the operational status. A status variable has been defined as to take values +1, −1, and 0 according to the operational status of the process: Value 0 is assigned to normal operations, and values +1 and −1 to high and low operations, respectively. Notice that calculating the status variable requires an online analysis of the ethane concentration from the bottom of the column; such a variable (AI-1) is presently measured by a continuous-flow gas-chromatograph every 18 min and returned with a delay of over 90 min. Clearly, the delay and low sampling frequency associated with the analytical measurements of ethane can pose severe limitations to any real-time analysis of the column’s status. In this study, we are performing the analysis of the operation modes of the deethanizer when the analytical measurements of ethane are replaced by real-time estimates. For the purpose, a soft-sensor based on the multi-layer perceptron has been developed to infer the bottom ethane concentration from other easy-to-measure process variables. In that sense, the availability of an inferential model allows the development of a fully automated tool to be implemented in the plant’s distributed control system (DCS). Moreover, the existing instrumentation setup available for the unit would benefit from a backup (soft) measurement for such an important variable. On the basis of these considerations, the proposed approach to data-derived analysis and inference for the deethanizer can be summarized as follows. The analysis, Figure 2, starts with the design of an MLP-based estimation model, and it is then followed by a SOM-based exploratory phase where the following occurs:

• a high status (operations of the column where the concentration of ethane exceeds the allowable upper bound (above 2.0%)) • a low status (operations of the column where the concentration of ethane is below the allowable lower bound (below 1.8%)) where the lower bound (1.8%) has been suggested by the plant’s management. The two abnormal conditions have important economic implications. In fact, a process operated at high status is delivering a product out of specification; whereas for low status operations, the product is within the specifications, but an unnecessary operational cost is requested. According to the plant’s management, normal operations are rarely met; hence, the goal is to investigate under which conditions such modes are experienced, identify the most sensitive process variables, and suggest an alternative control scheme.



DATA AND PROBLEM DESCRIPTION To analyze the behavior of the unit, a set of process data has been collected from the plant’s distributed control system. The measurements correspond to 3 weeks of continuous operation in the winter and 3 weeks in the summer. The data are available as 3-min averages for 27 process variables (Table 1), thus allowing a macroscopic characterization of the unit. Table 1. Deethanizer Process Variables tag

variable

tag

variable

FIC-1 TI-1 TI-2 TIC-1 TI-7 TIC-2 TI-6 TI-3 TI-4 TI-5 TI-8 FIC-2 TIC-3 DP

inlet flow-rate inlet temp inlet temp inlet temp top temp enriching temp enriching temp stripping temp stripping temp stripping temp bottom temp reflux flow-rate vapor temp delta pressure

FIC-4 LIC-1 FIC-5 FI-1 PIC-1 TI-10 FIC-3 TI-9 LIC-2 PIC-2 LIC-3 AI-1 AI-2 status

vapor flow-rate reboiler level bottom flow-rate distillate flow-rate distillate pressure reflux temp bypass flow-rate condensed temp top drum level blowdown pressure bottom drum level ethane conc butane conc operating mode

1. The process measurements are initially used to calibrate a SOM, on the map the data are quantized and projected. 2. The calibrated map is then used to visualize and analyze the data in terms of dependencies and clustering structure. The exploration phase is followed by an extrapolation phase, where both the calibrated MLP and SOM models are used as reference for new unseen data.



INFERENCE AND ANALYSIS The approach is discussed on the process data available for the deethanizer. Specifically, the winter observations are used for learning the MLP and the SOM models and the summer observations are for validation and extrapolation purposes. The choice of winter data for learning and summer data for

The column has a number of control loops that supervise the process. Briefly, the temperature TIC-2 and the vapor temperature TIC-3 out of the reboiler are controlled by manipulating the reflux flow-rate FIC-2 and the steam flow-rate FIC-4 to the reboiler, respectively, with both loops cascaded with the corresponding flow-rates. The distillate pressure PIC-1

Figure 2. MLP inference and SOM analysis. 13735

dx.doi.org/10.1021/ie202854b | Ind. Eng. Chem. Res. 2012, 51, 13732−13742

Industrial & Engineering Chemistry Research

Article

Figure 3. SOM visualization: the ordered component planes.

The analysis has been performed using a standard SOM-based visualization technique conventionally used to identify similarities between variables.16 The technique orders the component planes (one for each process variable) of a trained SOM in such a way that component planes (and, thus variables) that show high similarity are placed near each other. Similarity is here quantified in terms of simple linear correlation. From Figure 3, it is easy to notice how variables corresponding to the same section of the column (feed, top, and bottom) are similar and close to each other and, among them, those most correlated to the bottom ethane concentration are those corresponding to the stripping section of the column, followed by the feed and top operating conditions. Except from confirming that the SOM is capable to recover most of the predominant features and structures encoded in the

validation is only dictated by the chronological order of the events. Soft-Sensing the Ethane Concentration. As a first step, an inferential model capable to estimate the concentration of ethane from the bottom of the column (the primary and output variable) starting from other easily accessible process measurements (the secondary and input variables) has been developed. The estimates have been obtained from a subset of input variables selected after ranking them according to their significance for the output with the measure of topological relevance on the SOM. For this purpose, a preliminary SOM has been calibrated on the winter data using all the measured and calculated process variables (28). On the map, we visually analyzed the component planes to obtain an initial understanding of the relationships existing between process variables. 13736

dx.doi.org/10.1021/ie202854b | Ind. Eng. Chem. Res. 2012, 51, 13732−13742

Industrial & Engineering Chemistry Research

Article

concentration), with the temperature after the heat-exchangers being less relevant because of the stabilizing effect on the column feed. As expected, other inputs also very relevant are associated to the manipulated variables. On the basis of these considerations and after removing those variables with low signal-to-noise ratio, significant delay and those encoding redundant information, only a small subset of secondary variables has been retained and used as inputs to learn the MLP model on the winter observations. Although the retained variables cannot be explicitly reported here for confidentiality reasons, the selection can be easily understood by following the guidelines discussed by Baratti et al.17 It is, however, worth noticing that, given the computational unfeasibility of an exhaustive search through all the possible subsets of inputs variables, such a variable selection scheme has been demonstrated to be necessary and useful: In fact, the calibration and cross-validation of 226 − 1 MLPs would be required but is practically impossible. The learned MLP directly encodes the input−output relationships without an explicit dynamic model, and information about the dynamics is directly recovered by the used input variables. The MLP-based soft-sensor uses logistic transfer functions and the parameters of the model (the number of hidden nodes and the connection weights) are calibrated and optimized by using a standard cross-validation scheme and the Levenberg−Marquardt method. The performance assessment of the MLP-based soft-sensor as an alternative monitoring system is presented for one significant behavior in the summer. The reported testing period spans over an about 16-h window when the unit was subjected to an abrupt feed change (in flow-rate and possibly composition) and run under critical operating conditions. Following the temporal evolution from Figure 5, the diagrams show a process that is initially operated at the lower bound of the normality conditions and, as the process has moved further

data, the visual analysis did not permit any quantitative characterization of the relevance of the process variables with respect to the problem of estimating the bottom ethane concentration. This information has been retrieved by calculating the MTR on SOM for each possible input−output pair. After ranking the inputs according to their relevance for the output (Figure 4), we can see that the most significant

Figure 4. Relevance of the secondary variables to the ethane concentration. The variables are ranked according to the MTR on SOM.

variables are the temperatures in the bottom section of the column (as expected), with TI-5 and TI-8 the closest to the column outlet and also the most important ones. Then, the most related inputs are the inlet temperatures (which depend on the upstream operations and are related to the inlet ethane

Figure 5. MLP estimates: testing results for approximately 16 h of operation in the summer. The measurements are indicated with dots, and the MLP estimates are shown with a red line. 13737

dx.doi.org/10.1021/ie202854b | Ind. Eng. Chem. Res. 2012, 51, 13732−13742

Industrial & Engineering Chemistry Research

Article

in time, an abrupt variation in the feed flow-rate (FIC-1) occurred. The variation triggered the action on the steam to reboiler flow-rate (FIC-4), as well as the reflux flow-rate (FIC2) to control the top temperature TIC-2. The events initiated a sequence of oscillations in the ethane concentration that is efficiently recovered by the soft-sensor. Figure 6 depicts the complete testing period of three weeks of continuous summer operation. The estimates on the full set

Figure 7. SOM visualization: the U-matrix (a), the clustered SOM (b), and the component plane of the status variable (c). In part b, the three clusters are dyed in red, green, and blue, corresponding to high (+1), normal (0), and low (−1) operational modes (values of the status variable in part c), respectively.

Figure 6. MLP estimates: testing results for the three weeks of operation in the summer. The analytical measurements are indicated with dots, and the MLP estimates are shown with a red line.

of independent testing observations have an accuracy in terms of root mean square error (RMSE) that is smaller than 0.16%. Being that the RMSE is expressed in the same units of the original variable, it is possible to conclude that the designed MLP model can provide a great potential as a real-time monitoring system with an accuracy that is comparable to the analytical method for measuring the ethane concentration. In a previous study,18 we performed a thorough comparison with alternative prediction models and the MLP demonstrated capabity to outperform all the other tested algorithms. In the following, the MLP-based soft-sensor is thus used to completely substitute the gas-chromatograph analysis and calculate a real-time estimate of the bottom ethane. Monitoring the State of the Unit. In order to analyze under which conditions the different operating modes of the deethanizer are experienced, a bidimensional self-organizing map consisting of 70 × 24 prototype vectors organized on a hexagonal array has been calibrated on the winter data. Such a configuration, known as a big map, leads to a dense partitioning of the sample space that favors preservation of continuity and allows for the definition of high-resolution displays for visualization. The SOM was first initialized along the space spanned by the two largest eigenvectors of the covariance matrix of the observations and, as usual, the ratio between the corresponding eigenvalues was used to determine the number of nodes along the two dimensions of the map. The map is then calibrated on the analytical measurements of ethane and subsequently validated against the corresponding MLP estimates. On the SOM, the clustering structure of the data has been analyzed and the operating conditions of the unit have been visualized using the U-matrix, Figure 7a. On the matrix, areas with homogeneous coloring corresponding to small withincluster distances are clusters, whereas cluster borders are areas with homogeneous coloring corresponding to large betweencluster distances: In the figure, distances are depicted with dark blue colors shading toward dark red as the proximity between the nodes decreases. The visualization permits clear recognition of the presence of three well-separated clusters. Using the Kmeans algorithm19 on the SOM’s prototypes and coupling it with the Davies−Bouldin clustering index19 as a measure of cluster validity, the optimal number K of taxonomies has been

estimated; as expected, optimality is found for K = 3. After coloring the map’s nodes according to their cluster membership and comparing the dyed SOM with the component plane of the status variable (Figure 7b and c), it is straightforward to associate the three taxonomies to the three main operational modes of the deethanizer. Specifically, the panels show the clusters on the SOM as distinct regions dyed in red, green, and blue with a coloring scheme that assigns those colors to the operational modes (high, normal, and low status, respectively). A virtually identical structure is also retrieved from the component plane of the status variable (for values +1, 0, and −1, respectively). Though apparently less evident, the same structure is also observable in the component planes of the ethane concentration (Figure 8a) and a set of temperatures (Figure 8c−e) in the stripping section of the column. More importantly, looking for similar patterns in similar positions in these component planes allows visualizing the expected dependence between the ethane composition and such temperature indicators; in fact, near identical but reversed planes highlight their inherent inverse correspondence. Information about this dependence can be further enhanced by applying the coloring scheme resulting from the clustering directly to the original observations in the time domain. For the purpose, the time points of all the process variables have been dyed using the cluster color of the corresponding best matching unit on the SOM, in Figure 9 for the aforementioned variables. The figure shows the strong inverse correspondence between the temperature TI-4 and the bottom ethane concentration; the correspondence is observed as clear banded regions, with such a structure getting disrupted as temperature indicators closer to the feed and the bottom section are considered. When the analytical measurements of the ethane are substituted with the estimates obtained with the MLP-based soft-sensor, the clear band of normal operation is faithfully retrieved, Figure 9g. So far we have restricted the SOM-based analysis only to the measurements observed in winter, but it is possible to directly use the calibrated map as a reference model for unseen observations. On the basis of the sensor estimates, extrapolation with the self-organizing map is performed by initially projecting the summer data onto the map calibrated on the winter 13738

dx.doi.org/10.1021/ie202854b | Ind. Eng. Chem. Res. 2012, 51, 13732−13742

Industrial & Engineering Chemistry Research

Article

Figure 8. SOM visualization: The component planes for the ethane concentration and a selection of temperature indicators in the enriching section. The coloring of the component planes dyes in red high values of the variables and fades toward blue as the value decreases.

Figure 9. SOM visualization: the colored time series (three winter weeks) for a selection of process variable. The series are dyed in red, green, and blue corresponding to low, normal, and high operational modes (according to the SOM clustering).

using the SOM is based on finding the best matching unit on the calibrated (winter) map for the new and unseen summer data.

measurements, being the mapping based on the nearest neighbor criterion between the new data vectors and the prototype vectors of the map. In this regard, novelty detection 13739

dx.doi.org/10.1021/ie202854b | Ind. Eng. Chem. Res. 2012, 51, 13732−13742

Industrial & Engineering Chemistry Research

Article

Figure 10. SOM visualization: trajectories for a selection of observations (approximately, 12 h of continuous operation in summer), as displayed on the winter SOM.

Figure 11. SOM visualization: status transitions on the time domain (approximately, 16 h of continuous operation) for a selection of process variables. The summer observations are dyed according to the color of the corresponding BMU’s on the winter map.

The temporal evolution corresponds to the beginning of the time window reported when assessing the extrapolation results of the MLP-based soft-sensor (Figure 5). Following from Figures 10 and 11, the diagrams show a process operated between the green and blue region (normal and low status). As new BMU’s are visited and added to the trajectory, the column eventually leaves this near-normality condition and crosses the boundary toward the high region (in red). In the same fashion, the process variables in the time domain changed coloring to match the visited modes and thus allowed to appreciate, as already pointed out, that this abrupt change in the operation was mainly due to a variation in the feed conditions and subsequent control actions.

To present the results in extrapolation, we illustrate another SOM visualization technique. The approach allows following operational changes in the process and provides a simple display for identifying reasons of specific behaviors. For the task, the SOM (Figure 7b) has been enhanced by the inclusion of the point trajectories followed by the process in summer. The trajectory permits intuitive indication of the current status of the process and observation of how it was reached. In Figure 10, the evolution of the process is depicted for a small time window of about 12 h of continuous operation. The trajectory passes through all the BMU’s, and it is represented as a red line connecting the visited nodes (marked with yellow dots thickening with the visit count). 13740

dx.doi.org/10.1021/ie202854b | Ind. Eng. Chem. Res. 2012, 51, 13732−13742

Industrial & Engineering Chemistry Research

Article

Figure 12. Deethanizer flowsheet, including the alternative control strategy.

primary concentration controller cascaded with a secondary temperature controller.

We conclude, pointing out that the winter and summer data are different because of the seasonal variations which affect the operating conditions of the unit. This is due to the limited cooling capacity in summer and the variations in the inlet concentration. However, it has been verified that the proposed solution works also when swapping the learning and validation sets. Alternative Control Strategy. As previously stated, the objective of the column is to produce as much ethane as possible while satisfying a constraint on the amount of ethane in the bottom stream which, according to the plant’s management, should be within the 1.8−2.0% range. The present control configuration tries to achieve this objective by controlling the vapor temperature (TIC-3) manipulating the steam flow-rate to the rebolier (FIC-4). Comparing the temporal evolution of the controlled temperature TIC-3 (Figure 9f) with that of the estimated bottom ethane (Figure 9g), it is possible to observe only a weak correspondence between these two variables. On the other hand, the temperature TI-4 (Figure 9c) in the stripping section has a much stronger correspondence with the ethane concentration, as shown by the clear separation of the three operating stati. Thus, such a variable should be used as the controlled variable.20 On the basis of this analysis, we are proposing an alternative control configuration that pairs TI-4 and FIC-4, with a set-point that lays in the 51−53 °C range. Such a range corresponds to the boundaries of the normal status (the green band in Figure 9c and g). A further improvement that could provide a more consistent operation in terms of ethane impurity in the bottom, while maximizing the amount of recovered propane and satisfying quality restrictions, would be the cascade composition−temperature control21 depicted in Figure 12. With the suggested strategy, the ethane is maintained at normality by a



CONCLUSION

The implementation and application of a methodology to model, visualize, and analyze the information encoded in typical industrial process data is presented. The approach is based on two classical machine learning methods widely used in process systems engineering, the self-organizing map and multi-layer perceptron. The SOM is used toward the development of a framework for the identification of process operational modes and to present the information of intelligible displays. The capabilities of this approach are further enhanced by including a MLP as a regression model for estimating a key primary variable otherwise difficult to measure in real-time. The incorporation of the MLP-based soft-sensor allows us to convert an otherwise offline analysis strategy into a fully automated monitoring tool to be implemented online within the plant distributed control system. A full-scale industrial deethanizer is used as a case study to illustrate the main features of the proposed real-time environment. Overall, the application of the approach to the industrial unit allowed to identify significant operational modes of the system, to develop a quality estimator for real-time applications, and to develop an efficient monitoring tool to follow operational changes in the plant. On the basis of the analysis, the existing control configuration of the unit could be redefined toward more consistent operations. Because the developed monitoring tool is characterized by an inherent modularity and it is general in its definition, it can be easily exported and used in developing analogous systems for monitoring different processes. 13741

dx.doi.org/10.1021/ie202854b | Ind. Eng. Chem. Res. 2012, 51, 13732−13742

Industrial & Engineering Chemistry Research



Article

(21) Skogestad, S. Dynamics and control of distillation columns: A critical survey. Model. Ident. Control 1997, 18, 177−217.

AUTHOR INFORMATION

Corresponding Author

*E-mail: francesco.corona@aalto.fi. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS J. A. Romagnoli and F. Corona gratefully thank the Regione Sardegna and the University of Cagliari for support through the program Visiting Professor 2011.



REFERENCES

(1) Kohonen, T. Self-Organizing Maps, 3rd ed.; Springer: New York, USA, 2001. (2) Haykin, S. Neural Networks and Learning Machines, 3rd ed.; Prentice Hall: Upper Saddle River, NJ, 2009. (3) Kadlec, P.; Gabrys, B.; Strandt, S. Data-driven soft sensors in process industry. Comput. Chem. Eng. 2009, 33, 795−814. (4) Alhoniemi, E. Unsupervised pattern recognition methods for exploratory analysis of industrial process data. Ph.D. thesis, Helsinki University of Technology, Finland, 2002. (5) Laine, S. Using visualization, variable selection and feature extraction to learn from industrial data. Ph.D. thesis, Helsinki University of Technology, Finland, 2003. (6) Hoskins, J. C.; Himmelblau, D. M. Artificial neural network models of knowledge representation in process engineering. Comput. Chem. Eng. 1988, 12, 881−890. (7) Qin, S. J.; McAvoy, T. J.; Nonlinear, P. L. S. Modeling using neural networks. Comput. Chem. Eng. 1992, 16, 379−391. (8) Corona, F.; Mulas, M.; Baratti, R.; Romagnoli, J. On the topological analysis of industrial process data using the SOM. Comput. Chem. Eng. 2009, 34, 2022−2032. (9) Kaski, S. Data explorations using self-organizing maps. Ph.D. thesis, Helsinki University of Technology, Finland, 1997. (10) Himberg, J. From insight to innovations: Data mining, visualization and user interfaces. Ph.D. thesis, Helsinki University of Technology, Finland, 2001. (11) Vesanto, J. Data exploration process based on the Self-Organizing Map. Ph.D. thesis, Helsinki University of Technology, Finland, 2002. (12) Ultsch, A. Self-Organizing Neural Networks for Visualization and Classification. In Information and Classification; Springer: Berlin, Germany, 2003; pp 307−313. (13) Corona, F.; Reinikainen, S.-P.; Aalioki, K.; Perkkiö , A.; Liitiuäinen, E.; Baratti, R.; Lendasse, A.; Simula, O. Wavelength selection using the measure of topological relevance on the SelfOrganizing Map. J. Chemometr. 2008, 22, 610−620. (14) Corona, F.; Liitiäinen, E.; Lendasse, A.; Sassu, L.; Melis, S.; Baratti, R. A SOM-based approach to estimating product properties from spectroscopic measurements. Neurocomputing 2009, 73, 71−79. (15) Guyon, I.; Eliseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003, 3, 1157−1182. (16) Vesanto, J. SOM-based data visualization methods. Intell. Data Anal. 1999, 3, 111−126. (17) Baratti, R.; Vacca, G.; Servida, A. Neural network modelling of distillation columns. Hydrocarb. Process. 1995, 74, 35−38. (18) Zhu, Z.; Corona, F.; Lendasse, A.; Baratti, R.; Romagnoli, J. A. Local linear regression for soft-sensor design with application to an industrial deethanizer. In Proceedings of the 18th IFAC World Congress, Milano, Italy, September 2, 2011; pp 2839−2844. (19) Dubes, R. C. Cluster Analysis and Related Issues. In Handbook of Pattern Recognition & Computer Vision; Chen, C. H.; Pau, L. F.; Wang, P. S. P., Eds.; World Scientific Publishing Co., Inc.: River Edge, NJ, USA, 1993; pp 332. (20) Luyben, W. L. Evaluation criteria for selecting temperature control trays in distillation columns. J. Process Contr. 2006, 16, 115− 134. 13742

dx.doi.org/10.1021/ie202854b | Ind. Eng. Chem. Res. 2012, 51, 13732−13742