Improved Nonlinear Quality Estimation for Multiphase Batch

Dec 15, 2017 - Third, the refinery explanation of the data-driven statistical model shall be enhanced, which could be regarded as the guidance for pro...
1 downloads 13 Views 787KB Size
Subscriber access provided by READING UNIV

Article

Improved Nonlinear Quality Estimation for Multiphase Batch Processes based on Relevance Vector Machine with Neighborhood Component Variable Selection Jinlin Zhu, and Furong Gao Ind. Eng. Chem. Res., Just Accepted Manuscript • DOI: 10.1021/acs.iecr.7b03590 • Publication Date (Web): 15 Dec 2017 Downloaded from http://pubs.acs.org on December 24, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Industrial & Engineering Chemistry Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Improved Nonlinear Quality Estimation for Multiphase Batch Processes based on Relevance Vector Machine with Neighborhood Component Variable Selection Jinlin Zhua, Furong Gaoa,b ∗ a. Department of Chemical and Biomolecular Engineering, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong SAR b. Fok Ying Tung Graduate School, Hong Kong University of Science and Technology

Abstract Batch processes are usually involved with a succession of operating phases so as to produce the high-added value market-oriented products. During the manufacturing process, different batch phases can be driven by various nonlinear correlations between production quality variables and those measurement variables. Moreover, process variables also show varying influential effects on final qualities with the evolution of operating phases. In order to make reasonable real-time quality estimations, it should be necessary to identify influential quality-relevant variables as well as key time slices. In the present work, a new multiphase dimension reduction method called the phase-based neighborhood component variable selection (pNCVS) is first utilized to filter out the insignificant variables in each phase. Afterwards, the phase-based relevance vector machine (pRVM) is developed for each nonlinear phase to extract those influential time slices. The proposed two-step analysis flowchart can analyze and extract quality-relevant information from both variable-wise and time-wise dimensions, which could enhance the effectiveness and efficiency for statistical regression. Feasibility of the proposed method is demonstrated by a numerical example and the fed-batch penicillin fermentation process.

Keywords:

Nonlinear quality prediction, Variable selection, Neighborhood component

variable selection, Relevance vector machine, Multiphase batch processes



Corresponding author: E-mail address: [email protected]

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 38

1. Introduction In batch manufacturing processes, quality estimation or prediction has been highly focused so as to help maintain a high level standard production quality and also to provide empirical guidance for efficient production improvement and fault detection 1-4. Due to the high complex property of batch processes, those first principle based methods can be hardly constructed 5. As the flexible and feasible alternatives, data-based multivariate statistical methods have become appealing over the past few decades

6-8

. Traditionally, the multiway partial least squares (MPLS) method can be

considered as the pioneering study work which has been successfully applied in industry process 9. In spite of that, two critical issues can still be found for such method. On one hand, the MPLS has not taken into consideration of the multiphase inherent characteristic of the batch processes 10. On the other hand, the performance of MPLS may be deteriorated for modeling those processes with nonlinear correlations. To improve the modeling and prediction performance, it should be desirable to divide the batch process into several operating phases and then build nonlinear statistical inferential models for each phase. In literatures, several research efforts have been made to deal with such issue. For instance, Yu has developed the adaptive kernel PLS for each phase to accommodate the process nonlinearities 11. Liu et al proposed the recursive least-squares support vector regression combined with just-in-time scheme to capture the nonlinear variations of batch data

12

. To further take

account for prediction uncertainties, probabilistic methods have been embraced. For example, the Gaussian process regression method has been applied which uses Bayesian inference to make predictions from different local domains13. In reference 14, the authors have utilized Gaussian mixture regression for multiphase quality predictions and the feasibility has been validated on

ACS Paragon Plus Environment

Page 3 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

several numerical examples. However, it should be mentioned that most previous works have directly organized the batch data in the specified two-way manner for statistical analysis, regardless of the fact that variable effects on production quality can be time-variant over the operational evolution. In other words, the dominant variables that show essential impacts on production can be dissimilar from phase to phase, while those implicated insignificant variables as regression irrelevant information can be neglected for analyzing certain operating regions. Actually, the incorporation of insignificant variables makes little contribution to the empirical inferential model and may even cause deterioration for prediction application

15

. Therefore, one

should instead consider the proper selection of quality-related variables for statistical modeling and real-time application. Generally speaking, the employment of variable selection will bring about three main advantages. First, the practical modeling efficiency can be greatly improved, especially for large-scale high-dimension dataset. Second, the quality monitoring performance of the inferential model may be improved since those insignificant or even irrelevant redundancies have been eliminated and the generalization can be ensured. Third, the refinery explanation of the data-driven statistical model shall be enhanced, which could be regarded as the guidance for production quality control and improvement. As a consequence, variable selection should be regarded as one of the most critical concerns for the reasonable development of statistical inferential models. In previous studies, several types of variable selection techniques can be found for those traditional regression methods especially like PLS 16. For instance, the genetic algorithm incorporated PLS (GA-PLS) has been proposed for identifying relevant spectroscopic regions 17. The GA based method optimizes the performance of PLS by operators such as mutation, crossover

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 38

and selection and the final optimal variables are those most selected ones during the performance optimization process. In literature 18, the uninformative variable elimination based PLS is proposed, and later is improved by incorporating the Monte-Carlo mechanism (MCUVE-PLS) for optimal variable selection and model calibration 19. The MCUVE-PLS method can be regarded as the Monte-Carlo based random variable search method, and the variable importance grades are evaluated by the so-called reliability indexes which can be calculated from the statistics of PLS regression coefficients. In reference 20, the variable importance in projection of PLS (PLS-VIP) has been presented and the variable influence is calculated by the weighted explained sum of squares: relevant variables show VIP value larger than 1 while those irrelevant variables are commonly attached with VIP values below than 0.5

21

. Recently, a comprehensive investigation

and comparison have been carried out for these variable selection mechanisms, the merits and limits have also been discussed

22

. Nevertheless, one can infer that all these variable selection

methods are driven by the specific linear PLS regression process, the explanation performance for selected variables cannot be ensured for those nonlinear situations. The nature of most process data should be nonlinear and accordingly one has to consider the nonlinear feature selection solutions. For nonlinear feature selection problem, a commonly resorted methodology is the mutual information (MI) strategy. In theory, the MI can be considered as a reasonable way for exploring the dependency degree between the quality and measurement variable

23

. However, it

should be mentioned that the practical realization is highly non-trivial due to the fact that most collected data entries belong to continuous variables and the MI strategy originally designed for discrete variables cannot be readily applicable 24. From this perspective, it should be necessary to develop a more generally applicable nonlinear variable selection method.

ACS Paragon Plus Environment

Page 5 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

To deal with the nonlinear variable selection issue for industrial multiphase batch processes, this work proposes a new phase-based nonlinear variable selection strategy. Motived by the fact that the nearest neighborhood method should be the most simple and effective technique for nonlinear discriminations, a novel phase-based neighborhood component variable selection (pNCVS) method is constructed for batch production quality estimation. The proposed method defines a specific distance measure with variables for the expectation of best nearest neighborhood interpretations regarding each quality response. Based on the distance metrics, the stochastic neighborhood paradigm is defined according to neighborhood closeness softmax transformations. And then, the cost function is derived in a differentiate formulation through the multiplication of stochastic neighborhood paradigm with the associated leave-one-out (LOO) regression performance. Finally, the general loss function is derived by the expectation of LOO prediction errors with a regularization term for global minimization. In this way, unknown parameters can be solved by the common optimization method to derive the distance metric weights. Through analysis and discussions, we show that the induced distance metric weights exactly indicate the influence of variables for making quality-relevant estimations in each time-slice. Consequently, an efficient phase-based variable selection algorithm flowchart is then established to accumulate the time-slice variable selection results and obtain the quality relevant variables for each phase. Once the influential variables have been determined, one has actually considered the variable-wise information extractions. The batch production can be influenced not only by variable items but also the dynamic evolutionary along time axis when those influential variables make the critical effects. To most multiway-based methods, those batch-wise data stacks are repetitive series from various manufacturing cases, and batch-wise variations can be usually

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

unfolded into the two-way representation which in turns may result in numerous redundancies for statistical analysis. In other words, different expanded time regions show varying impacts on the production quality and it should be desirable to put focus not only on the effects from variable items but also the time slices they make the differences upon the final production level. From this perspective, we further propose to build the phase-based relevance vector machine or pRVM for time-wise significance slices searching and indexing for each phase. The original RVM is well known for sparse probabilistic inference and has been widely applied in investigating sparse sequential representations

25-27

. However, previous studies have rarely considered both the

variable-wise and time-wise quality-relevant information extractions. The proposed two-step analysis strategy in this work extracts nonlinear quality-relevant information from both coordinates so as to improve the model explanatory ability as well as the prediction performance, which also should be the most significant difference from traditional studies. From another point of view, the reduced variable dimensionalities provide informative subspaces for the lower-order computations among offline modeling and online applications, which should be particularly beneficial for analyzing the potentially big process data chunk augmented by many repetitive batches. The rest of this work is organized as follows. In section 2, the detailed descriptions as well as working flowchart of the proposed two-step phase-based analysis method are provided, including the development of single slice NCVS, the phase-based variable selection algorithm and the statistical construction of pRVM for quality prediction. In section 3, a numerical example is firstly given for the evaluation of the proposed variable selection technique, followed by the industrial application on the fed-batch penicillin fermentation process. Finally, conclusions are made in the

ACS Paragon Plus Environment

Page 6 of 38

Page 7 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

last section.

2. Methodology In this section, the batch data pretreatment will be first discussed. Afterwards, the neighborhood component variable selection is proposed for the phase-based nonlinear feature selection of batch data. Based on the selected variables, the time-wise sparse nonlinear method relevance vector machine is established for online quality estimation.

2.1 Batch data pretreatment Conventionally, batch process measurement data are usually stacked with the three-way matrix as X ( I × J x × K ) , where I is the batch number, J x is the process variable number and K is the time duration for each batch. For calculation convenience, the three-way matrix can be

batch-wisely unfolded into a two-dimensional data matrix X ( I × KJ x ) , where the decomposition of time-slices can be given as X ( I × KJ x ) =  X1 ( I × J x ) , X 2 ( I × J x ) ,..., X K ( I × J x ) 

(1)

For simplicity, in this work, we further assume the batch process has already been divided into S different phases and the above decomposition can be rewritten as X ( I × KJ x ) =  X1 ( I × K1 J x ) , X 2 ( I × K 2 J x ) ,..., X S ( I × K S J x )  S

where K s is the time duration length number of phase s and

∑K

i

(2)

= K . The batch-wise

i =1

unfolding can be commonly used for data normalization so as to eliminate the batch-wise variations in each sampling data entry xijk . Specifically, we have the normalized data entry xijk as xijk = x jk =

xijk − x jk

1 I ∑ xijk , σ jk = I i =1

σ jk 2 1 I ∑ ( xijk − x jk ) I − 1 i =1

ACS Paragon Plus Environment

(3) (4)

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 38

After normalization, one has the normalized matrix as X ( I × KJ x ) . In this work, such formulation is used for nonlinear variable selection. To conduct the influential model construction, the data matrix is reorganized into the variable-wise unfolding formulation. According to the phase division information, the time-slice decomposition can be given as  X1 ( K1 I × J x )    X2 ( K2 I × J x )   X ( KI × J x ) =   ...    X S ( K S I × J x )

(5)

Analogously, for the analysis of three-way production quality data Y ( I × J y × K ) , one can also take the similar batch-wise unfolding and variable-wise reorganization respectively as Y ( I × KJ y ) = Y1 ( I × K1 J y ) , Y2 ( I × K 2 J y ) ,..., YS ( I × K S J y ) 

(6)

 Y1 ( K1 I × J y )     Y2 ( K 2 I × J y )  Y ( KI × J y ) =   ...      YS ( K S I × J y ) 

(7)

2.2 Phase-based neighborhood component variable selection In this part, the nonlinear neighborhood component variable selection method is developed for batch data with multiple phases. Consider the measurement and quality time-slice pairwise as

{X

s,k

( I × J x ) , Ys,k ( I × J y )}

where s = 1, 2,..., S , k = 1, 2,..., K s . Since each production quality

may be influenced by distinguished relevant variables, the variable selection are preferably analyzed for each output quality variable yq , q = 1, 2,..., J y . Specifically, the object here is to seek for the weighted vector w s , q so that the following transformed distance metric will improve the performance of nearest neighborhood based regressions: Jx

d Ws ,q ( x s , k ,i , x s , k , j ) = ∑ ( wsl , q ) xsl , k ,i − xsl , k , j 2

(8)

l =1

where wsl , q is the lth element of w s , q and xsl ,k ,i ( xsl ,k , j ) is the lth element of sample x s ,k ,i ( x s , k , j ).

ACS Paragon Plus Environment

Page 9 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

One can speculate that the simplest way to ensure the prediction performance can be derived by minimizing the general LOO prediction error on all the data entries affiliated in the current phase time-slice with the nearest neighborhood rule. Unfortunately, previous studies show that the prediction error in that case is a discontinuous function of the weight vector 28, 29. As an alternative, it should be desirable to define a properly transformed neighborhood assignment so as to obtain the differentiable cost function. To make that, the probabilistic softmax transformation formulation is used and the probability that data entry x s ,k ,i selects x s , k , j as the stochastic nearest neighborhood can be defined as

(

(

exp − d Ws ,q ( x s , k ,i , x s , k , j )

)

psi ,, kj = P Ref ( x s , k ,i ) = x s , k , j | D − i =

Ks



h =1, h ≠ i

(

)

exp − d Ws ,q ( x s , k ,i , x s , k , h )

)

, psi ,,ik = 0

(9)

where Ref ( x s , k ,i ) refers to the stochastic nearest neighborhood for stochastic regression reference and D − i denotes the rest dataset without x s , k ,i . Please mention that the stochastic regression follows the two facts: the reference data Ref ( x s , k ,i ) is randomly chosen from the dataset with the associated selection probability; and the estimated quality response value yˆ s , k ,i for x s , k ,i equals to the corresponding quality value of the reference data. Under this framework, the loss function ls , k ,i can be defined to represent the disagreement degree between yˆ s ,k ,i and the true value ys ,k ,i as

(

ls , k ,i = E l ( ys , k ,i , yˆ s , k ,i ) | D − i =

Ks



h =1, h ≠ i

=

Ks



)

psi ,,hk l ( ys , k ,i , ys , k , h ) psi ,,hk ys , k ,i − ys , k , h

(10)

h =1, h ≠ i

=

Ks



psi ,,hk δ si,,kh

h =1, h ≠ i

Notice that for simplicity, the loss function is defined by the absolute deviation. In order to

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 38

alleviate the potential overfitting, the above loss function is also weighted with the regulation term to form the objective function as f ( w s ,q ) =

1 Ks

Ks

∑l h =1

Jx

s , k ,h

+ λ ∑ ( wsl , q )

2

(11)

l =1

where the regulation parameter λ > 0 can be tuned via cross validation. To minimize the above objective function, one needs to take the derivative of the objective function with respect to wsl , q as: ∂f ( w s , q ) ∂wsl , q

=

Ks  l  2  Ks  i Ks i,h l l i,h i, h l l  ∑  ps , k ∑ ps , k xs , k ,i − xs , k , j − ∑ ps , k δ s , k xs , k ,i − xs , k , j  − K s λ  ws , q K s  i =1  h =1, h ≠ i h =1, h ≠ i  

(12)

Ks

where psi , k = ∑ psi ,,hk δ si,,kh . Based on the derivative, the weight vector can be induced by various h =1

common optimization methods such as the gradient descent method. The detailed information of the gradient descend algorithm can be found by various optimization studies like 30 and is omitted here for simplicity. Once the approximate optimization has been completed, one can obtain the variable weight vector w s , q for the qth quality variable in phase s. Before one can move on, it’s worthwhile to give some remarks. In fact, such weight vector regulates the defined sample pair-wise distance metrics so that those closely scattered data samples within the quality space should also be close to each other in the measurement space. In addition, one can also judge that the soft alignment with probabilistic transformation actually acts as a link bridge between the two spaces of the global regulation. The obtained weight value in the vector indicates the direction of such regulation which helps to improve the prediction performance. In this sense, it can be seen that each nonzero weight value implies the influential degree of the variable for quality prediction improvement and hence variables should be readily selected from those attached with large weights. One can also speculate that the developed variable selection course will not suffer from the nonlinear process

ACS Paragon Plus Environment

Page 11 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

characteristics benefiting from the fact that the embedded nearest neighborhood scheme is a typical nonlinear-driven technique. This advantage makes the proposed method generally applicable to various industrial process systems. The above variable selection can be performed for each quality variable on time-slices of the concerned phase so as to identify the quality-relevant variables for the the phase-based inferential model development. Among each time-slice, the variable xsl is selected if the corresponding weight wsl , q ≥ ε w where ε w =10 -3 . Such definition eliminates those insignificant variables that may be intensified by process noise. Further assume that the variable selection indicator vector for quality variable yq in time-slice k of phase s can be denoted as Ξ sq, k = ξ sq, k ,1 , ξ sq, k ,2 ,..., ξ sq, k ,J  x

where ξ sq, k , j = 1 if the jth variable has been selected. To evaluate the influence degree of the variables to quality in the current phase, one can accumulate and then normalize the selection indicator vector series as τ s,q =

1 Ks

Ks

∑Ξ

q s ,k

(13)

k =1

Here τ s , q = τ s1, q ,τ s2, q ,...,τ sJ, q  reflects the influences or selection probabilities which are actually x

the normalized frequencies of selection in the concerned phase. Once again, a threshold ετ can be defined so as to further filter out those variables with very low selection probabilities. Practically, the threshold value can be experimentally decided based on the prediction performance. In this way, variables are selected by checking those ones with highly selected frequencies for each phase. Specifically, for each quality variable q, variables show important J impact during phase s can be represented as x*s ,q =  x1s , q , xs2,q ,..., xs ,sq,q  where J s , q is the selected

variable number. Those selected variables can be readily incorporated for statistical inferential model establishment. As a sketch, the entire pNCVS method flowchart is outlined in Algorithm 1.

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 38

Algorithm 1 (Phase-based neighborhood component variable selection) Input:

{X

Phase

measurement

input

and

( I × J x ) , Ys, k ( I × J y )}k =1 , regulation parameter λ Ks

s,k

quality

output

time-slice

pairwise

and the threshold value ετ .

Output: Selected variable set x*s , q for quality variable yq , q = 1, 2,..., J y . Start

For each quality variable Step 1: For each measurement sample, calculate the probabilistic softmax transformation by

Eq.(9) with all possible neighbors; Step 2: Apply gradient descent method to approximate the weight vector estimation w s , q ; Step 3: Calculate the variable selection indicator vector Ξsq,k for each time-slice in the

phase; Step 4: Calculate the influence indicator vector τ s , q with Eq.(13) and make the comparison

with the threshold ετ to determine the selected variable set x*s , q . End for End

2.3 Phase-based RVM for time-wise modeling Once the critical variables have been decided for each phase, one also need to investigate those relevant time regions/slices for various operating phases. In this work, the sparse modeling method called relevance vector machine is employed for the time-wise statistical modeling as well as quality relevant time-slice determination. The RVM is originated from the probabilistic and sparse framework of support vector machine (SVM) and is especially suitable for nonlinear regression. Let X*s ,q denote the collected data instantiations from selected variables x*s , q , then

ACS Paragon Plus Environment

Page 13 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

the nonlinear formulation between the measurement input and quality output pairs

{x

* s , q ,i

, y s , q ,i }

Ks

i =1

can be expressed as 25: y s , q = f ( x*s , q , ω s , q ) + e s , q

where

f (.)

denotes the nonlinear function,

(14)

y s , q = { ys , q,i }

Ks

i =1

denotes the quality set,

e s , q ~ N ( 0, σ s2, q I ) is the Gaussian distributed noise. Follow the SVM framework, the nonlinear

function can be expressed with a linear weighted sum of K s basis functions as 25 Ks

f ( x*s , q , ω s , q ) = ∑ ωs , q ,iφs , q ( x*s , q , x*s , q ,i ) + ωs0, q = ωTs , q Φ ( x*s , q )

(15)

i =1

where ω s , q = ωs , q ,1 , ωs , q ,2 ,..., ωs , q , K  is the weighted parameter, φs , q ( x*s , q , x*s , q ,i ) is the kernel s

function

which

also

(

defines

)

Φ ( x *s , q ) =  Φ ( x *s , q ,1 ) ,..., Φ x *s , q , K s   

one

basis

is

function

the

(

)

for

design

each

sample,

matrix

with

Φ ( x*s , q ,i ) = 1, φs , q ( x*s , q ,i , x*s , q ,1 ) ,..., φs , q x*s , q ,i , x*s , q ,Ks  .   T

Under this framework, the likelihood formulation of quality data can be given as: p ( y s , q | ω s , q , σ s2, q ) = ( 2πσ s2, q )

− Ks / 2

 1  exp  − 2 y s , q − Φ ( x*s , q ) ω s , q  σ 2  s ,q 

(16)

Since the direct maximum likelihood based method like expectation maximization may cause severe overfitting for ω s , q and σ s2, q . To avoid this, the Bayesian constrain is adopted by defining proper conjugate prior distributions over the weight and variance parameters as 25 Ks

p ( ω s , q | α s , q ) = ∏ N (ωs , q ,i | 0, α s−,1q ,i )

(17)

i =0

Ks

p ( α s , q ) = ∏ Gamma (α s , q ,i | a, b )

(18)

p ( β s , q ) = Gamma ( β s , q | c, d )

(19)

i =0

where α s , q is the hyperparameter, β s , q = σ s−,2q , Gamma (.) is the Gamma distribution, and the implicated parameters are commonly assigned with small values as a = b = c = d = 10−4 . Using the Bayes rule, one can easily induce the posterior of the weight parameter as

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

p ( ω s , q | y s , q , α s , q , σ s2, q ) = ( 2π )

− (Ks +1) / 2

Σs, q

−1/ 2

T  1  exp − ( ωs , q − µ s , q ) Σ−s ,1q ( ω s ,q − µ s ,q )   2 

Page 14 of 38

(20)

which is also Gaussian distribution with the mean and variance as µ s , q = σ s−,2q Σ s , q Φ ( x*s , q ) y s , q

(

(21)

Σ s , q = σ s−,2q Φ ( x*s , q ) Φ ( x*s , q ) + A s , q

where A s , q = diag (α s , q ,0 , α s , q ,1 ,..., α s , q , K

s

T

)

)

−1

(22)

is a diagonal matrix. The estimations of A s , q as well as

σ s2, q can be obtained with the iterative expectation maximization method, and one can refer to 25 for more numerical analysis information. Once the distribution parameters have been determined, the inferential model can be readily used for online quality estimation. Consider a new collected and normalized sample xbs,q ,i from batch b, the online quality estimation can be computed by solving the integration of p ( ysb, q ,i | x bs , q ,i , y s , q , α s , q , σ s2, q ) = ∫ p ( ω s , q | x bs , q ,i , y s , q , α s , q , σ s2, q ) p ( ysb, q ,i | xbs , q ,i , ω s , q , σ s2, q ) dω s , q

(23)

which is also a Gaussian distribution and the induced mean estimation can be used as the expected quality value as yˆ sb, q ,i = µTs , q Φ ( xbs , q ,i )

(24)

To evaluate the estimation performance for a series of batches b = 1, 2,..., B , the averaged root mean square error (avgRMSE) and averaged coefficient of determination R2 (avgR2) criterion can be used. Specifically, for all sampling data points of test batches, the averaged avgRMSE can be defined for quality variable q as: K

avgRMSE ( q ) =

B

1 ∑ B b =1 K

avgR 2 ( q ) = 1 −



yˆ sb, q ,i − ysb, q ,i

i =1

(25)

K



ysb, q ,i − yˆ sb, q ,i

2



ysb, q ,i − ysb, q ,i

2

i =1 K

2

(26)

i =1

For a good regression model, R2 statistics that indicate the goodness of fit should be expected to

ACS Paragon Plus Environment

Page 15 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

close to 1, while the root mean square errors denote the prediction deviation should be ideally close to 0. From the above analysis, one can judge that an important property of RVM is to define Bayesian priors over hyperparameters and the posterior probability mass will then concentrate at very large values for certain α s , q entries by the penalty effect. As the consequence, one can infer that the corresponding diagonal entries of Eq.(22) will become zeros, which in turns lead to the posterior probabilities of those associated weights in Eq.(21) concentrating at zeros. Such effect can effectively ensure that only those relevant time-slices which affect quality estimation performance shall be retained. The rest time-slice vectors can be regarded as insignificant terms since they are attached with zero weights and make no practical impact on prediction results. Therefore, such flexibility adds up to the generalization ability of RVM to a wide range of nonlinear modeling situations. Another important remark, which also makes our method significantly different from previous RVM researches is that the predefined variable analysis flowchart will further enhance the phase-based RVM modeling efficiency by introducing compact and informative variable-wise dimensionalities, which in turns result in the lower order calculation of the kernel function. In this way, the RVM can be more appropriate for analyzing batch processes which are usually stacked with large datasets from a series of operating batches. Therefore, such merits make the proposed model more desirable for industrial batch process modeling and online application. As a summarization, the entire working flowchart of the phase-based neighborhood component variable selection and relevance vector regression is illustrated in Figure 1. The operations marked with blue shapes represent the offline training process while those

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

orange-colored operations denote the online application procedure. Also mention that the stacked shapes refer to the operation set for each quality variable. [Figure 1 about here]

3. Methodology In this section, the proposed method will be evaluated by two case studies. The first case study is a numerical example which simulates a simple nonlinear process with two specific phases, and variables in each phase are designed to make different influences on the quality variable so as to testify the nonlinear variable selection performance. The other one is an industrial application on the fed-batch penicillin fermentation process which is also a typical multiphase batch process benchmark.

3.1 Numerical example In this example, the simple nonlinear process is composed by ten process variables and one quality variable, the two phases are defined as: Phase 1: x ~ N (1, I10 ) , y = cos(x1 ) + x2 + 0.45x3 + cos(x4 ) - x5 +e Phase 2: x ~ N ( 0, I10 ) , y = x10 cos(x6 ) + sin ( x7 x8 ) - x9 +e The noise term in both phases are random zero-mean Gaussian noise with both variances at the level of 0.1. From above definition, one can infer that all process variables are separately sampled from Gaussian distribution so as to ensure the same significance under the similar scale level. Moreover, it can be inferred that the first case actually show linear relationships between the quality and the transformed process variable terms from x1 to x5 , and the rest variables have no impact on the quality item. In the second case, variables from x6 to x10 show strong nonlinear relationships with the quality item while the first five variables are independent with the quality

ACS Paragon Plus Environment

Page 16 of 38

Page 17 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

during the entire phase. [Figure 2 about here] For process modeling, 200 data samples have been randomly generated for each phase respectively. The quality data characteristics in different phases are illustrated in Figure 2, from which one can observe two operating phases. In fact, by testing the Jarque-Bera criteria at the 5% significance level, one can further decide that quality data by the first linear phase follows a Gaussian distribution with the associated p-value is 0.1094, while quality data from the second nonlinear phase is a non-Gaussian distribution with the associated p-value of 0.0343. Once the phase data have been collected and analyzed, we are in the position of technical validation regarding variable selection performance. For the comparison study with NCVS, those commonly used PLS-based methods have also been tested, including GA-PLS, MCUVE-PLS and PLS-VIP. The PCs for all PLS methods are set as 10, and the ratio of calibration samples to the total samples for MCUVE-PLS during each MC experiment is defined as 0.7. The variable selection results by all methods for both phases are given in Figure 3. [Figure 3 about here] For convenience purpose, the selected variables sorted by the associated indexed are listed in Table S1 and those cut-off values are defined according to the experimentally analysis. One can clearly judge from Figure 3 and Table S1 that the proposed nonlinear-driven variable selection method NCVS performs the best for both the linear and nonlinear phases and the selected variable items are consistent with the true facts. Moreover, it can be speculated that those insignificant variables are assigned with small weights close to the origin. On the contrast, traditional PLS-based modeling methods can perform well for only the linear case by taken into

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

consideration of all implicated variables. Despite of that, the accuracies cannot be practically ensured since one or more insignificant variables will be inappropriately engaged. To make matters worse, variable selection performances by those PLS-based methods show deterioration with varying degrees when the process variables show strong nonlinear correlations with the quality variable. Essentially, the PLS-based methods are driven with linear transformation analysis which can become cumbersome when the process data show strong nonlinearities. Therefore, as can be inferred from the Table 1, none of the PLS-based methods can reasonably decide the target variables for the second phase.

3.2 Industrial application In this part, the sparse nonlinear modeling and quality prediction performance of the proposed pNCVS-pRVM model will be evaluated through the industrial fed-batch penicillin fermentation process

31

. The penicillin fermentation process is a typical multiphase bioprocess

which is also characterized with strong nonlinear behaviors among variables. The detailed process flowchart is shown in Figure 4. The entire process can be commonly divided into 3 phases: the pre-culture phase, the fed-batch operation phase and the stationary phase. In the beginning phase, the fermentor initially starts with a batch culture for biomass growth, this initial pre-culture phase will last for around 40 hours when the cell mass turns into the stationary condition for the generation of penicillin production. Accordingly, the process comes into the fed-batch operation phase. In this phase, the substrate of glucose will be fed continuously into the reactor so as to maintain a suitable rate of the cell growth. In the meanwhile, the penicillin will start being generated at an exponential growth rate until one finally comes into the stationary phase when the growth can be negligible. In this work, a total of 12 process variables are considered for modeling

ACS Paragon Plus Environment

Page 18 of 38

Page 19 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

as listed in Table 1. The two response variables need to be predicted are the biomass concentration and the penicillin concentration. [Table 1 about here] [Figure 4 about here] [Figure 5 about here] In the experiment, a total of 80 batches have been collected and the entire duration of each batch is 500 h with the sampling interval of 1 h. Notice that in order to simulate the batch-wise variations, the initial value of culture volume in each batch is set to be at random in the interval [100, 102]. Moreover, Gaussian noises have also been added to simulate the measurement uncertainties. The data characteristics of 12 process variables are illustrated in Figure 5. According to the process analysis, three phases can roughly be divided as 0~44h (phase 1), 45~294h (phase 2) and 295~500h (phase 3). For the implementation of the proposed method, a total of 40 batches are randomly selected for model development and the rest 40 batches are for validation. [Figure 6 about here] Using the pNCVS algorithm in each phase, the quality-related variable selection results for both response variables represented with the normalized selection indicators (NSI) have been given in Figure 6. The cutoff value for NSI has been set as 0.06 in this study and the selected variables have been listed in Table S2. It can be seen that different variables should be included for estimating biomass concentration and prediction concentration in the first phase. However, the influential models share the same influential variable sets for the rest two phases involving dissolved oxygen concentration, culture volume, carbon dioxide concentration, generated heat and cooling water flow rate. In industrial penicillin fermentation processes, one can infer that some

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

important factors which affect the penicillin generation should include the substrate concentration, temperature, pH value, dissolved oxygen concentration and the biomass. Please notice that since the glucose concentration is actually difficult to be measured online, the substrate concentration is then usually controlled and tuned alternatively through measurements from the pH value, the dissolved oxygen concentration or the carbon dioxide concentration. Also mention that although the biomass concentration depends on the aeration rate and agitator power so that the oxygen transfer rate and the oxygen consumption rate will meet the balance for a suitable generation environment of penicillin, the measured dissolved oxygen concentration item is the actual factor which makes the difference. Therefore, one can speculate that the selected influential variable items are very consistent with the practical analysis. [Table 2 about here] Once the variable-wise influential items have been reasonably extracted, one can further build statistical inferential models with pRVM for quality estimation. For comparison, the basic phase-based PLS (pPLS), phase-based RVM (pRVM) models will also been constructed. The quantitative comparison results for pPLS, pRVM and pNCVS-pRVM are given in Table 2. One can judge from Table 2 that the proposed pNCVS-pRVM method shows the best performance and the RMSE values are smaller than the other methods. In the contrast, the conventional pPLS method show undesirable prediction errors since the linear modeling mechanism has a limited capacity for interpreting and analyzing those nonlinear process data. One can also see that the single pRVM without variable selection has comparable prediction performances with the proposed method. However, the modelling and monitoring efficiencies of the two methods are actually quite different. To see that, detailed investigations have been given and the time-wise

ACS Paragon Plus Environment

Page 20 of 38

Page 21 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

analysis weights of relevant vectors in each phase with both methods are shown in Figure 7 and 8. In addition, the typical overall time consumptions for both pRVM modeling techniques have been given in Table 3. [Figure 7 about here] [Figure 8 about here] [Table 3 about here] It can be easily seen that after the properly defined variable selection procedure, the ratio of the retained relevance vectors from each subsequent pRVM becomes dramatically fewer than the single pRVM counterpart with no variable selection mechanism. In other words, a sparser and more efficient inferential model can be induced than the traditional nominal method. As can be seen from Table 3, less time consumptions are required for characterizing the nonlinear data correlations as well as the subsequent online quality estimation applications. As the results, one can speculate that three main advantages can be concluded for the proposed two-step flowchart. First, the model interpretability has been enhanced through variable selection which is particularly useful for nonlinear process understanding. Second, both modeling and monitoring efficiency can be significantly improved which renders the proposed method more applicable for analyzing large batch datasets. Third, the statistical model complexity has been reduced without performance loss, which implies that the generalization ability can be confirmed. All these merits make the proposed method appealing for practical applications.

4. Conclusions In this work, a novel two-step statistical approach has been proposed for nonlinear multiphase batch process modeling and quality estimation. In the first place, the phase-based

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

neighborhood component variable selection is proposed for identifying significant quality-related variables from nonlinear phase data correlations. On the basis of the selected variables, the phase-based relevance vector machine is further developed so as to extract and weight informative time-wise relevant vectors for quality estimation. Afterwards, we have shown that in conjunction with the variable selection method, the constructed influential model can be more compact and efficient, also without the loss of generalization for practical industrial application. The effectiveness of the proposed method has been confirmed through a numerical example and the fed-batch penicillin fermentation benchmark.

Acknowledgement This work is supported in part by Hong Kong Research Grant Council under project number 16233316, and Guangdong Innovative and

Entrepreneurial Research Team Program

(NO.2013G076).

Supporting information This information is available free of charge via the Internet at http://pubs.acs.org/. Table S1: List of selected variables by different methods Table S2: Selected variables of different response variables for each phase

References 1.

Yao, Y.; Gao, F., A survey on multistage/multiphase statistical modeling methods for batch

processes. Annual Reviews in Control 2009, 33, (2), 172-183. 2.

Kadlec, P.; Gabrys, B.; Strandt, S., Data-driven soft sensors in the process industry. Computers &

Chemical Engineering 2009, 33, (4), 795-814. 3.

Wang, Y.; Zhao, D.; Li, Y.; Ding, S. X., Unbiased Minimum Variance Fault and State Estimation

for Linear Discrete Time-Varying Two-Dimensional Systems. IEEE Transactions on Automatic Control 2017, 62, (10), 5463 - 5469.

ACS Paragon Plus Environment

Page 22 of 38

Page 23 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

4.

He, S.; Wang, Y., Modified Partial Least Square for Diagnosing Key‐Performance‐Indicator‐

Related Faults. The Canadian Journal of Chemical Engineering 2017. DOI: 10.1002/cjce.23002 In publish 5.

Zhao, L.; Zhao, C.; Gao, F., Between-mode quality analysis based multimode batch process

quality prediction. Industrial & Engineering Chemistry Research 2014, 53, (40), 15629-15638. 6.

Ge, Z.; Song, Z.; Gao, F., Review of Recent Research on Data-Based Process Monitoring.

Industrial & Engineering Chemistry Research 2013, 52, (10), 3543-3562. 7.

Zhao, C.; Wang, F.; Mao, Z.; Lu, N.; Jia, M., Quality prediction based on phase‐specific average

trajectory for batch processes. AIChE journal 2008, 54, (3), 693-705. 8.

Lou, Z.; Shen, D.; Wang, Y., Two‐step principal component analysis for dynamic processes

monitoring. The Canadian Journal of Chemical Engineering 2017. DOI 10.1002/cjce.22855 In publish 9.

Nomikos, P.; MacGregor, J. F., Multi-way partial least squares in monitoring batch processes.

Chemometrics and intelligent laboratory systems 1995, 30, (1), 97-108. 10. Camacho, J.; Picó, J., Multi-phase principal component analysis for batch processes modelling. Chemometrics and Intelligent Laboratory Systems 2006, 81, (2), 127-136. 11. Yu, J., Multiway Gaussian mixture model based adaptive kernel partial least squares regression method for soft sensor estimation and reliable quality prediction of nonlinear multiphase batch processes. Industrial & Engineering Chemistry Research 2012, 51, (40), 13227-13237. 12. Liu, Y.; Gao, Z.; Li, P.; Wang, H., Just-in-time kernel learning with adaptive parameter selection for soft sensor modeling of batch processes. Industrial & Engineering Chemistry Research 2012, 51, (11), 4313-4327. 13. Jin, H.; Chen, X.; Wang, L.; Yang, K.; Wu, L., Adaptive soft sensor development based on online ensemble Gaussian process regression for nonlinear time-varying batch processes. Industrial & Engineering Chemistry Research 2015, 54, (30), 7320-7345. 14. Yuan, X.; Ge, Z.; Song, Z., Soft sensor model development in multiphase/multimode processes based on Gaussian mixture regression. Chemometrics and Intelligent Laboratory Systems 2014, 138, 97-109. 15. Liu, J., Developing a soft sensor based on sparse partial least squares with variable selection. Journal of Process Control 2014, 24, (7), 1046-1056. 16. Mehmood, T.; Liland, K. H.; Snipen, L.; Sæbø, S., A review of variable selection methods in partial least squares regression. Chemometrics and Intelligent Laboratory Systems 2012, 118, 62-69. 17. Leardi, R., Application of genetic algorithm-PLS for feature selection in spectral data sets. Journal of Chemometrics 2000, 14, (5-6), 643-655. 18. Centner, V.; Massart, D.-L.; de Noord, O. E.; de Jong, S.; Vandeginste, B. M.; Sterna, C., Elimination of uninformative variables for multivariate calibration. Analytical chemistry 1996, 68, (21), 3851-3858. 19. Cai, W.; Li, Y.; Shao, X., A variable selection method based on uninformative variable elimination for multivariate calibration of near-infrared spectra. Chemometrics and intelligent laboratory systems 2008, 90, (2), 188-194. 20. Wold, S.; Sjöström, M.; Eriksson, L., PLS-regression: a basic tool of chemometrics. Chemometrics and intelligent laboratory systems 2001, 58, (2), 109-130. 21. Galindo‐Prieto, B.; Eriksson, L.; Trygg, J., Variable influence on projection (VIP) for orthogonal projections to latent structures (OPLS). Journal of Chemometrics 2014, 28, (8), 623-632. 22. Wang, Z. X.; He, Q. P.; Wang, J., Comparison of variable selection methods for PLS-based soft

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

sensor modeling. Journal of Process Control 2015, 26, 56-72. 23. Bennasar, M.; Hicks, Y.; Setchi, R., Feature selection using joint mutual information maximisation. Expert Systems with Applications 2015, 42, (22), 8520-8532. 24. Brown, G.; Pocock, A.; Zhao, M.-J.; Luján, M., Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. Journal of Machine Learning Research 2012, 13, (Jan), 27-66. 25. Tipping, M. E., Sparse Bayesian learning and the relevance vector machine. Journal of machine learning research 2001, 1, (Jun), 211-244. 26. Hernández, N.; Talavera, I.; Dago, A.; Biscay, R. J.; Ferreira, M. M. C.; Porro, D., Relevance vector machines for multivariate calibration purposes. Journal of chemometrics 2008, 22, (11‐12), 686-694. 27. Ge, Z.; Song, Z.; Gao, F., Nonlinear quality prediction for multiphase batch processes. AIChE journal 2012, 58, (6), 1778-1787. 28. Jacob, G.; Sam, R.; Hinton, G.; Salakhutdinov, R., Neighbourhood components analysis. Advances in Neural Information Processing Systems 2004, 17, 513-520. 29. Yang, W.; Wang, K.; Zuo, W., Neighborhood Component Feature Selection for High-Dimensional Data. Journal of Computers 2012, 7, (1), 161-168. 30. Sra, S.; Nowozin, S.; Wright, S. J., Optimization for machine learning. MIT Press: Cambridge, Massachusetts, 2012. 31. Birol, G.; Ündey, C.; Cinar, A., A modular simulation package for fed-batch fermentation: penicillin production. Computers & Chemical Engineering 2002, 26, (11), 1553-1565.

ACS Paragon Plus Environment

Page 24 of 38

Page 25 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Figure captions Figure 1: Illustration of the entire proposed working flowchart Figure 2: Data characteristic of the quality variable Figure 3: Variable selection results by different methods for (a) phase 1 (b) phase 2 Figure 4: Detailed working flowchart of the penicillin fermentation process Figure 5: Data characteristics of process measurement variables Figure 6: Normalized selection indicators in each phase for (a) biomass concentration (b) penicillin concentration Figure 7: Weights of Relevant vectors by two-step based pRVM for (a) biomass concentration case (b) penicillin concentration case Figure 8: Weights of Relevant vectors by pRVM for (a) biomass concentration case (b) penicillin concentration case

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table captions Table 1: Process and quality variables of the penicillin fermentation process

Table 2: Quantitative results for three methods Table 3: Typical time consumptions for modeling and estimation applications (in Sec.)

ACS Paragon Plus Environment

Page 26 of 38

Page 27 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Batch database

Data unfolding and preprocessing

Phase division

Phase 1#

Phase 2#

...

Phase S#

NCVS

Stacked for each quality variable

Variable set for Phase 1

Variable set for Phase 2

...

Variable set for Phase S

New batch

RVM set for Phase 1

RVM set for Phase 2

...

RVM set for Phase S

Data normalization

Quality estimation

Figure 1: Illustration of the entire proposed working flowchart

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2: Data characteristic of the quality variable

ACS Paragon Plus Environment

Page 28 of 38

Page 29 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

(a)

(b) Figure 3: Variable selection results by different methods for (a) phase 1 (b) phase 2

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 4: Detailed working flowchart of the penicillin fermentation process

ACS Paragon Plus Environment

Page 30 of 38

Substrate feed temp.

Substrate feed rate

CW flow rate

CO2 conc.

Culture volume Generated heat

Agitator power Ferm. temp.

DO conc.

Substrate conc. pH

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Aeration rate

Page 31 of 38

Figure 5: Data characteristics of process measurement variables

ACS Paragon Plus Environment

NSI NSI NSI

NSI

(a)

NSI

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

NSI

Industrial & Engineering Chemistry Research

(b) Figure 6: Normalized selection indicators in each phase for (a) biomass concentration (b) penicillin concentration

ACS Paragon Plus Environment

Page 32 of 38

Page 33 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Figure 7: Weights of Relevant vectors by two-step based pRVM for (a) biomass concentration case (b) penicillin concentration case

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 8: Weights of Relevant vectors by pRVM for (a) biomass concentration case (b) penicillin concentration case

ACS Paragon Plus Environment

Page 34 of 38

Page 35 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Table 1: Process and quality variables of the penicillin fermentation process Number

Variable

Unit

1

Aeration rate

L/h

Process/response variables

2

Agitator power

W

3

Substrate feed rate

L/h

4

Substrate feed temperature

K

5

Substrate concentration

g/L

6

Dissolved oxygen concentration

g/L

Process variables

7

Culture volume

L

8

Carbon dioxide concentration

g/L

9

pH



10

Fermentor temperature

K Kcal

11

Generated heat

12

Cooling water flow rate

L/h

y1

Biomass concentration

g/L

y2

Penicillin concentration

g/L

ACS Paragon Plus Environment

Response variables

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 38

Table 2: Quantitative results for three methods Methods

avgRMSE (y1)

avgRMSE (y2)

avgR2 (y1)

avgR2 (y2)

pPLS

0.1264

0.0088

0.9983

0.9996

pRVM

0.0635

0.0049

0.9996

0.9999

pNCVS-pRVM

0.0622

0.0048

0.9996

0.9999

ACS Paragon Plus Environment

Page 37 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Table 3: Typical time consumptions for modeling and estimation applications (in sec.) Phase 1

Phase 2

Phase 3

Method y1

y2

y1

y2

y1

y2

pRVM

0.526

0.288

311.030

65.179

89.486

248.660

pRVM (variable selection)

0.196

0.157

12.704

4.979

2.908

3.568

pRVM

0.132

0.133

4.273

4.238

2.864

2.804

pRVM(variable selection)

0.095

0.102

3.054

3.072

2.137

2.132

Modeling

Application

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

For Table of Contents graphic Only

Quality

Batch process

Predict

Quality

Predict

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Key variables

Key variables

Key time slices

Key time slices

ACS Paragon Plus Environment

Page 38 of 38