Nonlinear Process Monitoring Using Supervised Locally Linear

Sep 9, 2013 - In this paper, we propose using supervised locally linear embedding for projection (SLLEP) as a new nonlinear process monitoring techniq...
1 downloads 10 Views 2MB Size
Article pubs.acs.org/IECR

Nonlinear Process Monitoring Using Supervised Locally Linear Embedding Projection Kenneth S. McClure,*,†,‡ R. Bhushan Gopaluni,† Terrance Chmelyk,‡ Devin Marshman,‡ and Sirish L. Shah§ †

Department of Chemical and Biological Engineering, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada Spartan Controls, Burnaby, British Columbia V5A 4X5, Canada § Department of Chemical Engineering, University of Alberta, Edmonton, Alberta P6G 2M7, Canada ‡

ABSTRACT: The chemical and mineral processing industries need a nonlinear process monitoring method to improve the stability and economy of their processes. Techniques that are currently available to these industries are often too computationally intensive for an industrial control system, or they are too complex to commission. In this paper, we propose using supervised locally linear embedding for projection (SLLEP) as a new nonlinear process monitoring technique to solve these issues. In addition, we suggest using a commonly available tool in modern industrial control systems, a model predictive control, to solve the quadratic program of SLLEP in real-time and with minimal effort to commission. As a case study, we demonstrate that process monitoring with SLLEP can detect and diagnose the early onset of a semiautogenous grinding (SAG) mill overload. A SAG mill overload is a highly nonlinear operating situation, and we show that principal component analysis, the best-in-class technique currently used by the industry for monitoring an overload, is unable to detect the early onset of an overload.

1. INTRODUCTION Intelligent monitoring in the chemical and mineral processing industries is extremely important with modern unit processes containing hundreds, sometimes even thousands, of measurements. Process monitoring leads to improved stability and productivity by enabling operators to monitor significantly fewer variables than normally required to extract key information about a complex process. Conversely, without an appropriate monitoring technique, operators are expected to monitor hundreds of measurements individually, and these measurements are usually highly correlated and subject to considerable noise. This problem, where there is too much data and not enough information, is known as the “curse of dimensionality”, and many process monitoring tools are available to rectify this problem. For complex multivariate chemical and mineral processes, statistical process monitoring schemes can be successfully applied if sufficient operating data are available.1 Moreover, statistical techniques, unlike model-based techniques, do not require a strong fundamental understanding of a process or very accurate models to be effective.2,3 Many statistical techniques are available for extracting process information. One approach that is particularly powerful is principal component analysis (PCA) in combination with Hotelling T2 and Q test statistics.4 PCA utilizes singular value decomposition on a covariance matrix of a high-dimensional training data set to find a small set of uncorrelated variables called principal components (PCs) containing the most relevant information.5 In addition, T2 and Q test statistics monitor the PCs and are easily calculated for new observations in a digital control system (DCS). In the chemical and mineral processing industries, PCA is one of the leading multivariate process monitoring techniques. However, PCA is a linear technique that assumes sampled data © 2013 American Chemical Society

are multivariate normal, yet most processes in these industries are inherently nonlinear. As a consequence, the performance of PCA tends to be unreliable for these processes. For example, if PCA is used in nonlinear problems, then the minor PCs that are usually discarded contain important information that is lost.6 To address this issue, PCA has been extended to nonlinear processes through the use of kernel transformations that map data to a high-dimensional space before extracting features from the data.7 This nonlinear extension is known as kernel PCA. Unfortunately, kernel PCA yields a feature space that is difficult to interpret, is computationally expensive, and requires the storage of a very large n × n kernel matrix in the DCS, where n is the number of samples in the training data set.8,9 Owing to the drawbacks of kernel PCA for use in an industrial DCS, kernel PCA has not found widespread success in the chemical and mineral processing industries. As an alternative, a new feature extraction technique known as locally linear embedding (LLE), originally proposed by Roweis and Saul10 for visualization of nonlinear multivariate data, has been adapted by De Ridder and Duin11 and Li and Zhang12 for use in process monitoring in a technique known as supervised LLE for projection (SLLEP). SLLEP is capable of unraveling a highdimensional data manifold into low-dimensional space on the basis of the assumption that each data point and its neighbors lie on a locally linear patch of the manifold. The advantage of SLLEP over kernel PCA is that it is less computationally Special Issue: David Himmelblau and Gary Powers Memorial Received: Revised: Accepted: Published: 5205

May 15, 2013 August 31, 2013 September 9, 2013 September 9, 2013 dx.doi.org/10.1021/ie401556r | Ind. Eng. Chem. Res. 2014, 53, 5205−5216

Industrial & Engineering Chemistry Research

Article

classify the low-dimensional feature space into regions of nominal and abnormal operating conditions. These classified operating regions are known as the decision space. In this step, it is possible to include both categorical class information and an estimate of the degree to which a process is nominal or abnormal in the decision space. Generally, in this strategy, a single process monitoring technique is developed for each unique process abnormality in order to be sensitive to, and to properly diagnose, a specific process abnormality. SLLEP is presented in this work as a method to extract lowdimensional nonlinear features from high-dimensional nonlinear process data of a chemical or mineral process. We propose classifying the information-rich nonlinear features using either linear discriminant analysis (LDA) or quadratic discriminant analysis (QDA) to arrive at a decision about whether data are nominal or abnormal. Owing to the simplicity of these classifiers, they can easily be commissioned in a DCS, and no extra steps are involved to estimate the degree to which a process is nominal or abnormal. If a new observation is determined to be abnormal, then operators are alerted about the abnormality and its probable cause. Alternatively, McClure14 discusses an approach to use this univariate estimate in linear feedback predictive control to improve overall controller performance when a nonlinear process has shifted to an undesired operating region. Because operators are the primary personnel responsible for monitoring chemical and mineral processes, it is ideal for the feature space of SLLEP to be at most three dimensions (3D) for easier interpretation. SLLEP in image analysis applications has been shown to easily map very high-dimensional data to three or fewer dimensions.15,16 A three or lower-dimensional feature space allows operators to visually interpret features and classification boundaries. Otherwise, the process monitoring output will not be well understood, and operators may refuse to adopt the information from the process monitoring technique. Moreover, QDA can have erroneous classification results in a high-dimensional space.17 In Section 3, we compare SLLEP with LDA classification to the best-in-class monitoring technique in the industry, PCA using T2 and Q test statistics, to detect and diagnose a SAG mill overload. Kernel PCA is not used as a benchmark for comparison to SLLEP because this technique is not employed by the industry due to limitations of modern DCS to store the kernel mapping matrix. 2.1. Locally Linear Embedding. LLE is a feature extraction method that nonlinearly maps locally linear patches of a high-dimensional data set into a low-dimensional feature space that preserves the critical information in the data set. Furthermore, if LLE is trained with supervised learning, the low-dimensional feature space can be classified using a simple classifier. Before training the LLE model, we first determine which process variables should be monitored. To detect and diagnose a specific abnormal operating situation, we monitor only the process measurements that contain relevant information on, or are correlated with, the abnormality. To properly map nominal and abnormal process conditions, LLE should be trained with a large number of data that cover a wide range of operation and sampled only as frequently as necessary, including as extensive coverage as possible of the abnormality. Although storage of a large training data set in a DCS requires a lot of memory, the size of this data set can be substantially reduced by intelligently

expensive, the low-dimensional feature space is easy to interpret, and only an n × D data matrix needs to be stored in the DCS, where D is the dimensionality of the highdimensional observation space and D ≪ n. Although the computational requirements of SLLEP are low enough that it can be used in a modern DCS, SLLEP has not been applied in the chemical or mineral processing industries. In this work, we demonstrate a simple method to commission SLLEP into a modern DCS using existing control tools commonly available in these industries to solve the quadratic problem (QP) in real-time. We apply SLLEP to detect the early onset of a semiautogenous grinding (SAG) mill overload. We show that SLLEP with linear discriminant classification beats the best-in-class technique in the industry, PCA with T2 and Q test statistics,13 for monitoring a SAG mill overload. Furthermore, we discuss a simple method to interpret the classification output of the SLLEP feature space to estimate the severity of a process abnormality and provide operators with an information-rich univariate measurement. Finally, the three primary mechanisms of a SAG mill overload are collectively reviewed and illustrated for the first time in literature. This paper is organized as follows. We introduce SLLEP with two classification techniques as a nonlinear process monitoring strategy in Section 2, and in Section 2.1.3, we demonstrate how to easily commission SLLEP into a modern DCS. Additionally, Section 2.2 proposes a simple method to interpret the output of these classifiers to estimate the degree to which process behavior is nominal or abnormal. Finally, in Section 3, we apply SLLEP with linear discriminant classification on a SAG mill to detect the early onset of an overload, and we compare these results to the best-in-class technique currently used by the industry.

2. NONLINEAR PROCESS MONITORING Process monitoring is a tool used to monitor whether process behavior is nominal or abnormal. In other words, process monitoring can determine if a fault is present in a process or can determine if disturbances have caused a process to shift to an undesired operating region. For example, a SAG mill is a complex nonlinear process that benefits from monitoring of potential overload conditions. A SAG mill overload is an undesirable operating situation where the grinding efficiency of the mill is poor. Correspondingly, the onset of an overload is difficult to detect and cannot be monitored with any single measurement alone due to its complexity. The two main steps of statistical pattern recognition for process monitoring are shown in Figure 1. The first step is to extract the most important features from a measured highdimensional data set known as the observation space. This feature-extraction step reduces data sparsity and dimensionality of the data set, thus allowing for more accurate training of a classifier in the second step. The second and final step is to

Figure 1. Major steps of statistical pattern recognition for process monitoring. 5206

dx.doi.org/10.1021/ie401556r | Ind. Eng. Chem. Res. 2014, 53, 5205−5216

Industrial & Engineering Chemistry Research

Article n

selecting a subset of the original training data using the technique outlined by De Ridder and Duin.11 Once a training data matrix, X ∈ 9 n × D consisting of n observations and D measured process variables, has been obtained, we must standardize X to zero mean and unit variance or scale from 0 to 1. As a general rule for the chemical and mineral processing industries, we recommend scaling all measurements in X from 0 to 1 to mitigate the effects of changing process conditions. In other words, processes in these industries are subject to a wide variety of operating conditions, and measurements can be scaled so they are less sensitive to large changes in their mean values. In the most simple formulation of LLE discussed by Roweis and Saul,10 there are three fundamental steps performed offline of a DCS once X has been obtained: the first step is to select the nearest neighbors of each data point in X; the second step is to reconstruct each data point from its neighbors; the final step is to map the data to a low-dimensional feature space. Figure 2

minimize ε(W ) = W

subject to Wij = 0

n

∑ ∥xi −

∑ Wijxj∥22

i=1

j=1

if xj ∉ {xη1, xη2 , ..., xηK }

(2) (3)

n

∑ Wij = 1 (4)

j=1

Here, the weights Wij ∈ 9 gives the magnitude that the jth observation in X contributes to the ith reconstruction. Wij are constrained to zero in eq 3 if they do not belong to the set of nearest neighbors for the ith reconstruction. ε(W) adds up to the squared distances between all data points and their reconstructions. The optimal weights subject to the constraints in eqs 3 and 4 that minimizes the reconstruction error in eq 2 is a constrained least-squares problem.10 An illustration of the locally linear reconstruction of xi is shown in Figure 3. This figure shows that a nonlinear curve

Figure 3. Locally linear reconstruction of xi.15

must be drawn between xi and its two neighbors. In order to develop a linear reconstruction, the data point xi must be reconstructed onto a line between the two neighbors xη1 and xη2. The weights Wij are the degrees of freedom to minimize the reconstruction errors. Step 3: Map to the embedded coordinates. The third and final step is to map each high-dimensional observation, xi, to a low-dimensional feature vector, yi ∈ 9 d × 1. Each yi represents the d-dimensional global internal coordinates on the low-dimensional manifold, and we recommend to initially choose d = 3 to inspect the feature data in 3D space and identify which coordinates show that the nominal and abnormal process behavior are separable. Correspondingly, only these “fault-specific” coordinates are needed from LLE. yi is found by minimizing the embedding cost function

Figure 2. Three steps of the LLE algorithm.10

reviews the three steps of the LLE procedure. In the basic form of LLE, there is only one free parameter to determine in offline training: the number of neighbors, K ∈ 9 . Selection of K is discussed in Section 2.1.5. Step 1: Select neighbors. The first step is to identify the K nearest neighbors of X, where X has been standardized to zero mean and unit variance or scaled from 0% to 100%. To find the neighbors of the ith observation in X, xi ∈ 9 D × 1, the Euclidean distance between xi and every observation needs to be determined, and the K observations with the shortest distances are the neighbors. A neighbor of xi is denoted as xη. The Euclidean distance between xi and another observation in X, xj, is calculated from the expression:

∥xi − xj∥2

n

minimize Φ(y) = y

n

∑ ∥yi − ∑ Wij yj∥22 i=1

j=1

(5)

n

subject to

∑ yi = 0 i=1

1 n

(6)

n

∑ yyi iT = I i=1

(7)

where I is the d × d identity matrix. Equation 6 enforces a zero mean constraint on embedding coordinates, and eq 7 avoids degenerate solutions by constraining the embedding coordinates to have unit covariance. In practice, it is desirable to select d ≤ 3 for visual interpretation of the feature space. Alternatively, De Ridder and Duin11 propose a method to automatically determine d by ensuring the average variance retained by the locally linear neighborhoods exceeds a desired threshold. The embedding cost defined in eq 5 is a QP; however, subject to the constraints in eqs 6 and 7 that make the problem

(1)

where ∥·∥2 refers to the l2 norm. The K nearest neighbors characterize the geometry of these local patches by linear coefficients used to reconstruct each data point from its neighbors. Step 2: Reconstruct each data point from its neighbors. The second step is to reconstruct each data point from its neighbors by minimizing the cost function: 5207

dx.doi.org/10.1021/ie401556r | Ind. Eng. Chem. Res. 2014, 53, 5205−5216

Industrial & Engineering Chemistry Research

Article

well posed, the QP can be reformulated to solve a sparse n × n eigenvalue problem as detailed by Roweis and Saul.10 2.1.1. Supervised LLE Algorithm. An adaption to LLE is discussed by De Ridder and Duin11 to improve the method when it is used to extract features within a data set for subsequent classification. The technique is called supervised LLE (SLLE), and it includes class information in the algorithm by artificially adding a distance between samples from different classes. In process monitoring, there are usually two classes: data that is nominal and data that is abnormal. To perform SLLE, only Step 1 of the original algorithm is changed: before selecting the K nearest neighbors of xi, add an additional distance, R′ ∈ 9 , between data points if they belong to different classes. Consequently, eq 1 of Step 1 can be rewritten with R′ as follows:

∥xi − xj∥2 + R′

Yη = AX η

in the least-squares sense, where (12)

Yη = [yη , yη , ..., yη ] ∈ 9 d × K

(13)

K

2

and Yη is the embedded coordinate matrix of the feature data corresponding to the high-dimensional observation data matrix Xη. Each basis vector of A is obtained by solving the linear leastsquares regression problem K

a j = argmin ∑ ∥a T x ηi − y ηj ∥22 a

i

i=1

(14)

where 1 ≤ j ≤ d and yjηi is the jth element of yηi. The analytical solution for the case of XTη having full column rank is

(8)

a j = (X Tη )+ y ηj = (X ηX Tη )−1X ηy ηj

(XTη )+

where is the Moore-Penrose pseudoinverse of is defined as

(9)

where R = ∥xi − xj∥2 is the Euclidean distance between two data points, and max(R) = maxij∥xi − xj∥2 is the distance between the two observations that are farthest apart in X. For every combination of xi and xj that belong to separate classes, δ(xi, xj) = 0; otherwise, δ(xi, xj) = 1. Lastly, γ controls the amount of class information incorporated into R′, where 0 ≤ γ ≤ 1. In practice, γ is often chosen to be at most 0.2 because large γ values result in data from the same class to coalesce on the same point,11 thus resulting in poor mapping of new data to the feature space with LLE projection.12 Comparatively, γ can be very small if there are outliers in the data that cause max(R) to be large. Owing to the ability of SLLE to separate data belonging to different classes, we are able to use a simple classifier, such as LDA or QDA, when classifying the feature space into regions of nominal and abnormal process operation. 2.1.2. LLE Projection for Mapping. Because LLE is a nonparametric method, there is no equation to map new observations into the same feature space discovered by SLLE during offline training. As a result, Li and Zhang12 adapted SLLE to project a new observation, xnew ∈ 9 d × 1, to the feature space as ynew ∈ 9 d × 1 for subsequent classification. This technique is called LLE projection, and it requires the entire training data set, X, and the entire embedded coordinate vector, Y ∈ 9 n × d , to be stored in the DCS. Here, Y is a matrix containing all embedded coordinates yi determined from eq 5. There are three steps computed at each execution in the DCS for LLE projection: the first step is to find the nearest neighbors of the new observation; the second step is to compute a mapping matrix; the third step is to project the new observation into the feature space. Step 1: Select neighbors For a new high-dimensional sample, xnew ∈ 9 D × 1, find its K neighbors, Xη, in the training data set X by a Euclidean distance measurement, where X η = [xη1 , xη2 , ..., xηK ] ∈ 9 D × K

A = [a1, a 2 , ..., ad]T ∈ 9 d × D 1

Here, the distance R′ that incorporates class information is calculated from the expression R′ = R + γ max(R )(1 − δ(xi , xj))

(11)

y ηj = [y ηj , y ηj , ..., y ηj ]T 1

2

K

(15)

XTη ,

and yjη (16)

If XTη is column rank deficient, such as in the case where K < D, then eq 14 is ill-posed and there are infinitely many solutions for aj. In this case, we refer to ridge regression discussed by Li and Zhang12 to solve for A. Step 3: Project observation into feature space. Calculate the embedding coordinate, ynew, for xnew: ynew = Ax new

(17)

2.1.3. Solving the QP in an Industrial DCS. In Step 2 of LLE projection, the solution for A is obtained by solving the QP in eq 14. Because the analytical solution given in eq 16 requires the computation of a matrix inverse, it is difficult to program and execute in an industrial DCS. In many industrial controllers, matrix inversion is not built in and must be programmed using elementary row operations, often using Gauss−Jordan elimination.18 A new method is proposed in this work to solve A. Because model predictive control (MPC) is a pre-existing module found in most modern DCS, it can be used as a QP solver to solve eq 14. The analytical solution for the QP of a standard MPC controller, using the dynamic matrix control algorithm, is discussed by Camacho and Bordos19 and given as u = (GTRG + Q)−1GTR(w − f)

(18)

where u is an array of future control signals, w is an array of reference signals, f is an array containing the free response of the system, G is the dynamic gain matrix, and R and Q are the penalty-on-error and penalty-on-move diagonal weight matrices, respectively. Here, the analytical solution for the QP of MPC parallels the analytical solution for the basis mapping vector given in eq 16. The MPC solution is equivalent if GT = Xη, R = I, Q = 0, and yjη = w − f. To use MPC to solve eq 14, the manipulated variables (MVs) are treated as the elements of A, and the controller will determine the values of these elements in A by minimizing its cost function. Furthermore, the control variables are treated as the elements of Yη whose target set-points are the values of the elements in Yη. Finally, the input−output model gains of the

(10)

Step 2: Compute a mapping matrix Find a mapping matrix, A, that satisfies: 5208

dx.doi.org/10.1021/ie401556r | Ind. Eng. Chem. Res. 2014, 53, 5205−5216

Industrial & Engineering Chemistry Research

Article

then the mapping loses its nonlinear character and behaves like PCA. γ determines how much class information is included in training SLLEP. A large γ will excessively separate classes and lead to overgeneralization of the feature space. With a large γ, erroneous results may occur when projecting new observations into the feature space. Furthermore, the classifier loses its ability to extract meaningful information on the degree to which process behavior is nominal or abnormal because data from the same class converge on a single point in the feature space. On the other hand, a small γ does not incorporate any class information when training SLLEP. Consequently, it is less likely that LDA or QDA can classify the feature space due to the nonlinearity of the features. Because there are two free parameters to manipulate when training SLLEP, cross-validation should be performed over an extensive grid search of K and γ to optimize the selection of the parameters. Cross-validation finds the best parameter selection by evaluating the unbiased prediction accuracy of LDA or QDA.17 Once the parameters have been chosen and the SLLEP model has been trained, the model should then be tested on a test set of data before use in the industry. 2.2. Classification. LDA and QDA are simple classification methods that find a linear or quadratic combination of features that best separate two or more classes. In this work, we focus on the classification of two classes and leave the multiclass extension open to future research. As an alternative to multiclass classification when three or more classes are to be identified, one can construct an additional model that is particular to another abnormal situation relative to nominal operation. The main underlying assumption in LDA and QDA is that the data to be classified are multivariate normal.17 Although this is not always the case, LDA often remains an effective classifier if this assumption is not severely violated.17 However, QDA may lead to erroneous results when classifying a feature space greater than 2D if this assumption is violated.17 Because data mapped to the feature space by SLLEP are not necessarily multivariate normal, γ can be tuned to sufficiently separate the classes in the feature space and increase the effectiveness of LDA.11 Comparatively, QDA is a quadratic classifier, and it can be used to better classify a 2D feature space than LDA without having to overtune γ. LDA and QDA classifiers are trained in a supervised manner from the known class labels of the feature data. Therefore, we must divide the feature data into two groups: Y1 and Y2. Here, Y1 contains the nominal operating data, and Y2 contains the specific abnormal operating data that represents the abnormal operating range we are interested in detecting. The remaining steps between LDA and QDA differ from each other and are discussed separately. 2.2.1. Linear Discriminant Analysis. To construct a linear classifier, LDA assumes the covariance matrices of Y1 and Y2 are equivalent. Because the true covariance of the data is unknown, we use a pooled estimate from the sample covariance matrices, Spooled, that is calculated from the expression:17

MPC are the values of Xη pertaining to the elements of the input−output relation in eq 11, and it is not necessary to assume dynamics in these models. Additionally, any size of the prediction and control horizons is allowable; however, it is best to reduce these horizons to their minimum because the time to steady state is one iteration for a system without dynamics. 2.1.4. Example of LLE Projection Using MPC. MPC was used to find the projection of a data point in an S-shaped curve, shown in Figure 4. In this figure, n = 800 random samples, or

Figure 4. LLE mapping of an S-shaped curve to show the accuracy of LLE projection, where (a) is a 3D S-shaped manifold, (b) is a 2D mapping of the 3D with an inset illustrated in (c) of a data point and its neighbors, and (d) is the data point estimated from the neighbors in the LLE projection.

observations, of a 3D S-shaped manifold are shown in Figure 4a. In this plot, one observation is randomly selected, and its K = 12 neighbors are identified. Next, LLE was performed to map the 3D data to two-dimensional (2D) space and the results are shown in Figure 4b. The black box in Figure 4b depicts the inset containing the observation and its neighbors. The contents of this inset are shown in Figure 4c. Finally, in Figure 4d, MPC was used to perform linear regression on the neighbors of the observation, and the linear equation was used to project the coordinates of the data point in 3D space to the mapped 2D feature space. The plot shows that a discrepancy exists between the coordinates of the data point mapped by LLE and its projection from the linear equation. This is due to the nonlinearity of the S curve, and this error decreases as the number of samples the LLE model is trained with increases. The results from MPC for the mapping shown in Figure 4 are comparable to the analytical solution of eq 14. In fact, MPC discovers the exact same solution as the analytical solution from the Moore−Penrose pseudoinverse in exactly one iteration. To have the MPC converge on the solution immediately, the models in MPC, as described by the coordinates of the K neighbors in 3D space, were treated as gain-only models with no dynamics. In addition, the penalty-on-move term of the MPC was set to zero for all six of the MVs describing the linear mapping matrix A. 2.1.5. Selection of K and γ. The selection of K and γ significantly affects the ability of LDA or QDA to classify the feature space discovered by SLLEP. K determines the number of neighbors around a particular data point that are considered to be within a locally linear region of the observation space. If K is too small, then SLLEP does not have enough information about the global properties within the data set; if K is too large,

Spooled =

n1 − 1 n2 − 1 S1 + S2 (n1 − 1) + (n2 − 1) (n1 − 1) + (n2 − 1) (19)

where S1 and S2 are the sample covariance matrices calculated from Y1 and Y2, respectively, n1 is the number of observations 5209

dx.doi.org/10.1021/ie401556r | Ind. Eng. Chem. Res. 2014, 53, 5205−5216

Industrial & Engineering Chemistry Research

Article

⎧ f (yi) ≥ 0 ⎪ π1 class = ⎨ ⎪ ⎩ π2 f (yi) < 0,

in Y1, and n2 is the number of observations in Y2. The formula to find a covariance matrix is S=

1 YTY n−1

(20)

where:

The next step is to find the mean vectors of Y1 and Y2 y1̅ =

1 T 1 Y1 1 and y2̅ = Y2T1 n1 n2

k=

(22)

where π1 is the class of nominal operating data, π2 is the class of abnormal operating data, and 1 b̂ = (y1̅ − y2̅ )T S−pooled

m̂ =

1 1 (y − y2̅ )T S−pooled (y1̅ − y2̅ ) 2 1̅

⎛ c(1|2) ⎞ C = ln⎜ ⎟ ⎝ c(2|1) ⎠

(29)

3. CASE STUDY: SAG MILL OVERLOAD The case study is on a SAG mill at a mineral processing facility located in British Columbia (BC), Canada. Grinding is one of the most fundamental unit operations in a mineral processing facility, and optimal operation of grinding mills is important because they use nearly half of all the energy consumed in these facilities.20 Furthermore, the product particle size distribution from grinding has a large effect on the recovery of valuable minerals in downstream processing.21 In this case study, we demonstrate that SLLEP with LDA classification can successfully detect the early onset of a SAG mill overload, thus allowing operators to reduce the feed and correct the overload before grinding performance severely deteriorates. We also compare SLLEP-LDA with PCA, a technique that has been traditionally adopted for SAG mill process monitoring. However, in this study, we found that PCA is unable to detect a SAG mill overload due to the nonlinearity of the process. 3.1. Overview of the SAG Process. In a mineral processing facility, ore size must be reduced in order to liberate valuable minerals from waste rock for downstream processing. Accordingly, a SAG mill is often used in these facilities as a means to grind ore. SAG mills are unique to other grinding processes because they utilize the ore itself to reduce material size and can typically grind ore from 200 mm down to 0.1 mm in one unit.22 Semiautogenous grinding implies that a steel ball charge, usually between 4% to 15% of the total mill volume, is added to the ore charge to enhance grinding efficiency. In these mills, the total charge volume usually occupies between 30% and 35% of the mill interior.23 The main objective of a SAG mill is to maximize throughput, and subobjectives are to minimize variations in output particle size, power consumption, and steel ball and liner consumption.24,25 The main components of a typical grate-discharge trunnionsupported SAG mill, such as the one operated at the facility in BC, are described below and shown in Figure 5. 1. Feed trunnion: provides a support for the mill and an entrance for ore and water to enter the mill. 2. Mill shell: main chamber where ore is lifted by the lifting bars and tumbles down to grind the ore as the mill rotates.

(23)

(24)

(25)

Here, b̂ is the sample coefficient vector, m̂ is the estimated midpoint between the two classes, and C accounts for the cost of misclassification. In eq 25, c(1|2) is the cost associated with classifying an observation as π1 when it actually belongs to π2, and vice versa for c(2|1). For example, if a SAG mill is overloaded and we classify a new observation as belonging to the nonoverloaded class, then the mill overload is not corrected and the consequences are more severe than if the mill was not overloaded but was classified as overloaded. Likewise, c(1|2) and c(2|1) can be adjusted to give the desired classification accuracy. Not only does the LDA allocation rule dictate the class of an observation, it also gives the equation of the linear classification boundary as solutions to f(yi) = 0 T f (yi) = b̂ yi − m̂ − C

1 ⎛ |S1| ⎞ 1 T −1 ln⎜ ⎟ + (y̅ 1 S1 y1̅ − y̅ 2TS−2 1y2̅ ) 2 ⎝ |S 2 | ⎠ 2

Here, |S| denotes the determinant of S. The QDA allocation rule not only dictates the class of an observation but also gives the equation of the quadratic classification boundary for solutions to f(yi) = 0. The value of f(yi) indicates how much the quadratic classification boundary would need to be increased or decreased in size so that the ith observation lies on this boundary. In other words, f(yi) is a meaningful estimate of the magnitude that an observation is from the classification boundary that also accounts for boundary’s curvature. As a result, the magnitude of f(yi) can be used to estimate the degree to which new data is nominal or abnormal.

(21)

where 1 is a column vector of ones with as many elements as there are observations in Y1 or Y2. The allocation rule of LDA for a given observation in the feature space, yi, is T ⎧ ⎪ π1 b̂ yi − m̂ ≥ C class = ⎨ ⎪ π b̂ T y − m̂ < C , ⎩ 2 i

(28)

(26)

and f(yi) yields a value that is proportional to the distance an observation is located away from the linear classification boundary. This value is the closest distance an observation is to the classification line that has been scaled by a constant, arbitrary value. In other words, the magnitude of f(yi) greater or less than zero provides an estimate of the degree to which a process is nominal or abnormal, respectively. 2.2.2. Quadratic Discriminant Analysis. QDA is different from LDA in that there is no assumption that the covariance matrices of Y1 and Y2 are equivalent. Without this assumption, the classification regions are defined by quadratic functions of yi. The allocation rule of QDA is: 1 f (yi) = − y iT(S1−1 − S−2 1)yi + (y̅ 1TS1−1 − y̅ 2TS−2 1)yi − k − C 2 (27) 5210

dx.doi.org/10.1021/ie401556r | Ind. Eng. Chem. Res. 2014, 53, 5205−5216

Industrial & Engineering Chemistry Research

Article

Figure 5. Schematic of a typical grate-discharge SAG mill.26

3. Grate: a screen allowing the finished product in the form of slurry to pass into the discharge chamber. 4. Pulp lifter: lifts the ore slurry that passes through the grate to the discharge trunnion. 5. Discharge trunnion: provides a support for the mill and an exit for ore slurry. A diagram of the SAG mill at the facility in BC showing the main inputs, outputs, and measurements is given in Figure 6. 3.1.1. SAG overload. SAG mills are often operated as close to their upper throughput constraint as possible to maximize their economic performance. As a result of high feed rates, the operation of the mill can easily become unstable from overloading. Preventing overloading is nontrivial because the properties of the ore that affect grindability, such as ore size, hardness, and composition, are highly variable, and ore hardness is not feasible to measure online. Moreover, the internal state of the mill cannot be directly measured, and estimates from mill power and bearing pressures are seldom accurate.27 Consequently, it is difficult for operators to detect mill overloading, and once it has been detected, usually from an increasing bearing pressure and constant or decreasing mill power, the mill has been overloaded for a long period. Once detected, operators must then make sudden and large changes to decrease the feed ore rate to let the mill slowly grind down the charge. Overloading occurs as a result of one, or a combination, of three cases: volumetric overloading, freewheeling, or slurry pooling. Additionally, freewheeling or slurry pooling eventually leads to volumetric overloading. These three mechanisms of an overload are illustrated in Figure 7. In this figure, τ represents torque; x represents the length of a torque arm; subscript c refers to the charge; subscript p refers to the slurry pool; and c.g. denotes center of gravity. The following section discusses

Figure 7. Three possible mechanisms of an overload: (a) volumetric overloading, (b) freewheeling, and (c) slurry pooling.

these mechanisms of an overload for SAG mills where motor power is regulated to maintain a constant rotational speed. 3.1.2. Three Mechanisms of Overloading. The first mechanism of overloading occurs when the volume occupied by the charge increases such that the total distance that ore can fall under gravity, known as cataracting, decreases and causes the grinding efficiency of the mill to degrade.28 Figure 7a illustrates the reduction of the maximum potential energy (PEmax) of cataracting ore resulting from a loss in free-fall distance when the mill is overloaded. The figure also shows that the center of gravity shifts inside the mill, as illustrated by the length of xc. Consequently, it is possible to measure the load distribution by using the pressure measurements from the bearing units supporting the mill or by monitoring whether the motor power decreases resulting from a reduced countertorque provided by the charge. However, the density, volume, and distribution of the charge in the mill are highly variable and

Figure 6. Main inputs, outputs, and measurements of the SAG mill at the facility in BC. 5211

dx.doi.org/10.1021/ie401556r | Ind. Eng. Chem. Res. 2014, 53, 5205−5216

Industrial & Engineering Chemistry Research

Article

may not always lead to a significant change in τc to affect mill power noticeably. The second mechanism of mill overloading is when the charge becomes trapped between the lifter bars due to centripetal force when the viscosity of the charge is high.29 This phenomena is known as freewheeling because the inertia of the trapped charge combines with the kinetic energy of the mill and effectively reduces the motor load. In addition, the motor load decreases significantly because the remaining charge in the mill is poorly lifted due to obstruction of the lifter bars. Therefore, the remaining charge becomes partially static and results in poor comminution and an accumulation of ore in the mill. In this situation, increasing the feedwater or decreasing the ore feed rate is needed to decrease the viscosity of the charge and recover from the freewheel effect. The freewheel effect is illustrated in Figure 7b. The third mechanism of overloading is shown in Figure 7c and occurs when an excess of slurry builds up inside the mill from poor pulp lifter design, resulting in the formation of a slurry pool.30 The slurry pool drastically dampens the impact energy of the falling rock and reduces grinding efficiency. Slurry is held up within the solid charge as the material is lifted up with the rotation of the mill.26 As the slurry content increases, the slurry fills the interstitial space of the solids charge in the direction of the toe, or bottom, of the solids charge. When all the interstitial space is occupied by slurry, a pool then forms at the toe of the charge.31 As the slurry pool forms inside the mill, grinding efficiency decreases and the mill will eventually overload unless feed ore rate is reduced. 3.2. Nonlinear Process Monitoring with SLLEP-LDA. 3.2.1. Candidate Measurements for Model. To monitor a SAG mill overload, 10 measured and inferred candidate variables are considered in the process monitoring model. These 10 process variables are listed in Table 1, and there are 3

model. Data from both days are sampled every 2 min, and each measurement is filtered using a first-order filter with a time constant between 1 and 10 min. Moreover, all measurements have been scaled from 0 to 1 based upon their scale limits in the DCS and adjusted so that the variables have relatively equal standard deviations. The candidate variables for the portion of the training data set with the overload are shown in Figure 8.

Figure 8. Candidate process variables of the training data set for process monitoring.

Table 1. Summary of the Variables Retained in Each Candidate Model variables

model 1

model 2

model 3

bearing pressure motor power sound level center of gravity difference of slopes recycle discharge to pump box specific energy vibration size >2 in.

√ √ √ √ √

√ √ √ √ √ √ √

√ √ √ √ √ √ √ √ √ √



In Figure 8, we see that changes in bearing pressure and motor power are positively correlated with each other until the overload occurs around sample 163. This positive correlation breaks down as the overload worsens, and near sample 250, bearing pressure and motor power become negatively correlated, indicating severe overloading. Sound level is determined as the difference between the raw measurement and its 0.4 day moving average. Based on operator experience, a mill that is unusually quieter than normal is an overloaded mill. The center of gravity variable shown in the figure is a measurement of the load distribution to the side of the mill where charge is not lifted. Difference of slopes is a dimensionless variable calculated from the slope of the scaled total bearing pressure (BP) measurement minus the slope of the scaled total motor power (MP) measurement:

different candidate models based upon different combinations of these measurements. We will later show that model 3 is the best model for SLLEP-LDA process monitoring because of its ability to extract information from high-dimensional nonlinear data; however, PCA had the best results with the variables in Model 1, and PCA resulted in large prediction errors using the other models. Data from two overload events were made available from the mineral processing facility in BC. An overload occurred on February 29th, 2012, and these data are used as part of the training data set to build the process monitoring model. Another overload occurred on May 27th, 2012, and these data are used as the test data set to validate the process monitoring

Difference of Slopes =

ΔBP ΔMP − Δt Δt

(30)

where the slopes are determined from a first-order finite difference calculation given by:

5212

ΔBP BP[k] − BP[k − 1] = Δt Δt

(31)

ΔMP MP[k] − MP[k − 1] = Δt Δt

(32)

dx.doi.org/10.1021/ie401556r | Ind. Eng. Chem. Res. 2014, 53, 5205−5216

Industrial & Engineering Chemistry Research

Article

Here, k is the sample instance and Δt is the time between sample instances and is chosen to be 1 min; 1 min is reasonable because the time constant of the process is estimated to be around 10 min. Other variables shown in Figure 8 are other possible overload indicators. We expect recycle rate and discharge to pump box measurements to decrease during severe overloading. Specific energy is the total motor power divided by the estimated mass in the mill derived from the four bearing pressure cells. Vibration is a nonlinear variable measuring the magnitude of vibrations on one of the bearing cells. Finally, a size of >2 in. is a measure of the feed rate of ore with a particle size greater than 2 in. This is the only input variable of the training data set, and around sample 170, the feed was reduced in an attempt to offset overloading. A more detailed discussion of these measurements is available in McClure.14 Figure 9 shows the test data set from May 27th for validating the model. In this data set, there are three consecutive overload

Table 2. SLLEP-LDA Classification Results of the Candidate Models model

train sens.

train spec.

test sens.

test spec.

metric

1 2 3

91.1% 96.2% 98.9%

93.7% 93.7% 89.0%

70.0% 80.0% 90.8%

99.1% 98.1% 93.5%

84.6% 89.0% 92.2%

Table 2 summarizes the classification results for the SLLEPLDA process monitoring method using sensitivity, specificity, and a cumulative performance metric of the test statistics. Sensitivity is a measure of the proportion of true overload operating data that are correctly identified by the classifier. Sensitivity is defined by the following expression: sensitivity =

#of true overload #of true overload + #of false nominal

(33)

where a true overload corresponds to the data we labeled as overloaded before training the classifier and are correctly classified, and a false nominal corresponds to data classified as nominal by the classifier when instead we labeled that data as being overloaded. The other metric used to evaluate the performance of the classifier is specificity. Specificity is a measure of the proportion of true nominal operating data that are correctly identified by the classifier. Specificity is defined by the following expression: specificity =

#of true nominal #of true nominal + #of false overload

(34)

where a true nominal corresponds to the data we defined as nominal before training the classifier and are correctly classified, and a false overload corresponds to data classified as overloaded by the classifier, when instead we labeled that data as being nominal. The column labeled “metric” is the average of the sensitivity and specificity results for the test case. From Table 2, “metric” shows that model 3 provides the best detection results of the SAG overload. We also see that specificities are higher for the test set than the training set. Usually the converse is true; however, the test set is unique because the steel ball charge in the mill was overestimated; thus, the mill overloaded at a lower overall bearing pressure and throughput than expected. Consequently, it was less likely that the classifier misclassified nominal test data as being overloaded because some measurements appeared normal. In model 3, K is 26 and γ is 0.0012. The feature space and classification boundary for model 3 on the training data set is shown in Figure 10. Here, the equation of the classification boundary is: y = 0.49x + 0.57 (35)

Figure 9. Candidate process variables for a segment of the test data set for process monitoring.

events. This data set reflects process operation at conditions very different from the training data set; hence, the test set provides a realistic and challenging test for the final process monitoring model. 3.2.2. Process Monitoring Results. SLLEP-LDA was trained on the training data set shown in Figure 8, with the addition of more operating data from February and May, for a total training set of 1919 data points. QDA classification was found to be unnecessary in this case because LDA had better classification results. In LDA classification, we assumed equal costs of misclassification in order to compromise between false alarm rate and early detection of an overload; in other words, C in eq 25 is zero. The process monitoring results on all three models are summarized in Table 2.

where x corresponds to the first embedded coordinate, and y corresponds to the second embedded coordinate of the feature space. In Figure 10, we see that the SLLEP-LDA classifier successfully separates the nominal and overload operating data. Although there are misclassifications, these are the result of having to manually add class labels to the training data. This process is subjective, and only the most obvious overload data were labeled. Here, the data were assigned a binary category; however, the distinction between nominal and overload situations is fuzzy. Moreover, we studied these misclassifications from the time trend of the data and confirmed that these reasons are the only cause of the misclassifications. The 5213

dx.doi.org/10.1021/ie401556r | Ind. Eng. Chem. Res. 2014, 53, 5205−5216

Industrial & Engineering Chemistry Research

Article

9. On the basis of eq 22 with zero cost, an SLLEP output value less than zero indicates the SAG mill is overloaded. In the next section, we show that the Q and T2 statistics are unable to detect the early onset of the overload. 3.3. Process Monitoring with PCA. PCA, using the Q and T2 test statistics, is also trained as a process monitoring method for comparison to SLLEP-LDA. Ko and Shang13 introduce this technique for SAG monitoring and demonstrate its ability to detect process abnormalities for SAG mill operation. However, this study does not identify which abnormalities are overload events. Likewise, this technique has been used in the industry to detect a SAG overload with inconsistent success. This limited success is likely because a SAG overload is nonlinear, and it is challenging to select process measurements with linear characteristics between nominal and overloaded operation for PCA to be successful. In PCA, the classifiers traditionally used by the industry are the T2α and Qα upper control limits on the T2 and Q test statistics. T2 is a measure of variance in data that can be explained by the PC model. Conversely, Q is a measure of variance in data that cannot be explained by the PC model. To train PCA, training data for only the nominal operating range was used. This range was selected from the data shown in Figure 8, until the process became overloaded. Additionally, nominal data from earlier in the day on February 29th, 2012 was added, for a total of 400 training data observations. We found with this data set that there was enough excitation in the measurements to develop a clear correlation structure of the nominal range of operation. Although more data were available for training, we found that it was unnecessary to include, and it was difficult to precisely identify a linear range of nominal operating data that included enough excitation to train PCA. We found that model 1 had the best process monitoring results; PCA was able to reasonably estimate new observations using the variables of this model and not the other two models. In models 2 and 3, PCA poorly estimates new observations because the additional variables of these models cannot be accurately reconstructed from a linear equation. The output results of the T2 and Q test statistics on the test data set are shown in Figure 13.

Figure 10. LDA classification on the training data set mapped into 2D space by SLLE.

classification results for model 3 on the test data set are shown in Figure 11.

Figure 11. LDA and QDA classification on the test data set mapped into 2D space by SLLEP.

The classification results shown in Figure 11 illustrate that the process monitoring technique generalizes well to new data. From this figure, there are few misclassifications, and data that were misclassified were likely incorrectly labeled when manually categorizing the test data. In the training and test results, we see that a linear classification boundary is able to separate the nominal and overload data. Furthermore, a quadratic classification boundary is unnecessary, and it increases the risk of overgeneralizing the feature space. Figure 12 shows the process monitoring output results on the test data set with the bearing pressure and motor power trends included to help navigate between this figure and Figure

Figure 13. Q and T2 results on the test data set for PCA.

From Figure 13, we observe that neither test statistic is sensitive enough to detect the overload situation; conversely, the test statistics are sensitive to the severe corrective control actions following an overload that temporarily underloads the mill. During an underload, the nominal correlation structure of the data changes drastically and is easily detected by PCA. However, detecting a mill underload is trivial for operators. The

Figure 12. Output from process monitoring on the test set for SLLEPLDA. 5214

dx.doi.org/10.1021/ie401556r | Ind. Eng. Chem. Res. 2014, 53, 5205−5216

Industrial & Engineering Chemistry Research

Article

results in this figure show how challenging it is for a linear technique to monitor a nonlinear process. Alternatively, it is possible to improve PCA in this application by defining multiple PCA models for several approximately local operating regions. However, operators would have to monitor multiple variables or plots, and defining and maintaining the different operating regions for PCA is challenging. Kernel PCA was not considered because it is not typically commissioned in the chemical or mineral processing industries. Kernel PCA requires more resources in a controller than allowable. In addition, PCA and SLLEP can provide a lowdimensional visual interpretation of the feature space that is easy to interpret; however, the feature space from kernel PCA is difficult to interpret or relate to the original data, and this is an important attribute for operators to have. 3.4. Applicability of SLLEP. PCA should be tested as a process monitoring method before training SLLEP to decide whether PCA is reliable and provides early detection of a specific process abnormality. PCA is more desirable than SLLEP because PCA is simple to commission and requires minimal controller resources. However, if the results of PCA are unsatisfactory, it is possible that the process is nonlinear and we have discarded important PCs in the model containing information on the process abnormality. SLLEP should be used only when it is found that PCA leads to unsatisfactory process monitoring results. SLLEP requires more commissioning effort and controller resources than PCA. However, we have shown that SLLEP is effective at monitoring variables with nonlinear characteristics. In addition, it is difficult and time consuming to select the most appropriate variables to monitor with PCA, and training data must be carefully selected to avoid data that lies outside of the linear nominal operating range. SLLEP does not have restrictions in variable or data set selection. On the other hand, if process variables relating to a specific abnormality are selected, then the feature space of SLLEP can be classified into regions that are unique to the specific abnormality. PCA does not have this diagnostic capability with the T2 and Q test statistics. SLLEP should not be used if it is expected that process conditions change drastically. If this is the case, there will not be enough historical data available to properly map the feature space from SLLEP during offline training. As a consequence, new observations will be poorly mapped to the feature space because the neighbors in the training data set for a new observation will not be located on a locally linear patch of the manifold.

using MPC, a commonly available tool in modern DCS, to solve the quadratic problem in SLLEP in real-time with minimal programming effort. Additionally, the process monitoring method proposed was tested on a SAG mill overload case study and compared to PCA, the current best-inclass technique in the industry to monitor an overload. Due to the nonlinearity of the measurements, PCA was unsuccessful at monitoring the overload; however, SLLEP with LDA classification successfully differentiated the overload from nominal and underloaded operating conditions.



AUTHOR INFORMATION

Corresponding Author

*K.S. McClure. E-mail: [email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This work is financially supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) and Spartan Controls, Burnaby, Canada.



REFERENCES

(1) Venkatasubramanian, V.; Rengaswamy, R.; Kavuri, S. N.; Yin, K. A review of process fault detection and diagnosis: Part III: Process history based methods. Comput. Chem. Eng. 2003, 27, 327. (2) Venkatasubramanian, V.; Rengaswamy, R.; Kavuri, S. N.; Yin, K. A review of process fault detection and diagnosis: Part II: Qualitative models and search strategies. Comput. Chem. Eng. 2003, 27, 313. (3) Ohran, B. J.; Liu, J.; de la Pẽna, D. M; Christofides, P. D.; Davis, J. F. Data-based fault detection and isolation using feedback control: Output feedback and optimality. Chem. Eng. Sci. 2009, 64, 2370. (4) Kresta, J. V.; Macgregor, J. F.; Marlin, T. E. Multivariate statistical monitoring of process operating performance. Can. J. Chem. Eng. 1991, 69, 35. (5) Cheng, H.; Nikus, M.; Jämsä-Jounela, S.-L. Evaluation of PCA methods with improved fault isolation capabilities on a paper machine simulator. Chemom. Intell. Lab. Syst. 2008, 92, 186. (6) Dong, D.; McAvoy, T. J. Nonlinear principal component analysisbased on principal curves and neural networks. Comput. Chem. Eng. 1996, 20, 65. (7) Schölkopf, B.; Smola, A.; Müller, K. R. Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 1998, 10, 1299. (8) Lee, J.-M.; Yoo, C.; Choi, S. W.; Vanrolleghem, P. A.; Lee, I.-B. Nonlinear process monitoring using kernel principal component analysis. Chem. Eng. Sci. 2004, 59, 223. (9) Choi, S. W.; Lee, C.; Lee, J.-M.; Park, J. H.; Lee, I.-B. Fault detection and identification of nonlinear processes based on kernel PCA. Chemom. Intell. Lab. Syst. 2005, 75, 55. (10) Roweis, S. T.; Saul, L. K. Nonlinear dimensionality reduction by locally linear embedding. Science 2000, 290, 2323. (11) De Ridder, D.; Duin, R. P.W. Locally linear embedding for classification. In Technical Report PH-2002-01; Delft University of Technology: The Hague, Netherlands, 2002. (12) Li, B.; Zhang, Y. Supervised locally linear embedding projection (SLLEP) for machinery fault diagnosis. Mech. Syst. Signal Process. 2011, 25, 3125. (13) Ko, Y.-D.; Shang, H. SAG mill system diagnosis using multivariate process variable analysis. Can. J. Chem. Eng. 2011, 89, 1492. (14) McClure, K. S. Algorithms for nonlinear process monitoring and controller performance recovery with an application to semiautogenous grinding, M.S. Thesis, University of British Columbia, 2013.

4. CONCLUSION In this paper, we presented a new process monitoring method for the chemical and mineral processing industries. We demonstrated that SLLEP can extract nonlinear features from a set of 10 measured variables. Furthermore, we showed that a simple linear or quadratic classifier can classify the feature space into regions of nominal and abnormal operation. Ultimately, operators only need to monitor one variable, the distance a new observation is located from the classification boundary that divides data into nominal and abnormal operation, to determine if a process is in a specific abnormal operating region. We also discussed a method to incorporate an estimate of the process abnormality in this measurement to enhance the information given to the operator. The results of this work help to overcome the challenge of commissioning SLLEP in an industrial DCS. We proposed 5215

dx.doi.org/10.1021/ie401556r | Ind. Eng. Chem. Res. 2014, 53, 5205−5216

Industrial & Engineering Chemistry Research

Article

(15) Saul, L. K.; Roweis, S. T. Think globally, fit locally: Unsupervised learning of low dimensional manifolds. J. Mach. Learn. Res. 2003, 4, 119. (16) Wang, J. Geometric Structure of High-Dimensional Data and Dimensionality Reduction; Springer: Berlin, 2011. (17) Johnson, R. A.;Wichern, D. W. Applied Multivariate Statistical Analysis, 6th ed.; Pearson: NJ, 2007. (18) Friedberg, S. H.; Insel, A. J.; Spence, L. E. Linear Algebra, 4th ed.; Pearson: NJ, 2002. (19) Camacho, E.; Bordons, C. Model Predictive Control, 2nd ed.; Springer: London, 2007. (20) Yang, J.; Li, S.; Chen, X. S.; Li, Q. Disturbance rejection of ball mill grinding circuits using DOB and MPC. Powder Technol. 2010, 198, 219. (21) Chen, X. S.; Li, Q.; Fei, S. Constrained model predictive control in ball mill grinding process. Powder Technol. 2008, 186, 31. (22) Wills, B. A.; Napier-Munn, T. J. Mineral Processing Technology, 7th ed.; Elsevier: Oxford, 2006. (23) Gupta, A.; Yan, D. Mineral Processing Design and Operations; Elsevier: Amsterdam, 2006. (24) Craig, I. K.; MacLeod, I. M. Specification framework for robust control of a run-of-mine ore milling circuit. Control Eng. Pract. 1995, 3, 621. (25) Coetzee, L. C., Robust nonlinear model predictive control of a closed run-of-mine ore milling circuit, Ph.D. Thesis, University of Pretoria, 2009. (26) Latchireddi, S.; Morrell, S. Slurry flow in mills: Grate-only discharge mechanism (Part-1). Miner. Eng. 2003, 16, 625. (27) Wei, D.; Craig, I. K. Grinding mill circuits−A survey of control and economic concerns. Int. J. Miner. Process. 2009, 90, 56. (28) Morrell, S., The prediction of power draw in wet tumbling mills, Ph.D. Thesis, University of Queensland, 1993. (29) Dean, D. R., Memo to Jim Dunstan, Rosebud Joint Venture, Nevada, 1999; available at http://www.nbmg.unr.edu/scans/4010/ 60001735.pdf. (30) Mokken, A. H.; Blendulf, G. K. I.; Young, G. J. C. A study of the arrangements for pulp discharge on pebble mills, and their influence on mill performance. J. South African IMM 1975, 257. (31) Cleary, P. W.; Sinnott, M.; Morrison, R. Prediction of slurry transport in SAG mills using SPH fluid flow in a dynamic DEM based porous media. Miner. Eng. 2006, 19, 1517.

5216

dx.doi.org/10.1021/ie401556r | Ind. Eng. Chem. Res. 2014, 53, 5205−5216