ARTICLE pubs.acs.org/IECR
Two-step Multiset Regression Analysis (MsRA) Algorithm Chunhui Zhao† and Furong Gao*,†,‡ † ‡
Department of Control Science and Engineering, Zhejiang University, Hangzhou, Zhejiang, P. R. China Department of Chemical and Biomolecular Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, SAR ABSTRACT: In the present work, a multiset regression analysis strategy is developed by designing a two-step feature extraction procedure. Multiple predictor spaces, in which the same variables are measured on different sources of objects or the same number of objects is observed on different variables, are collected, preparing multiple regression data pairs in combination with response spaces. It focuses on finding the common regression structures across predictor spaces, which can be XY regression correlations or predictor variations closely related with each other, for the interpretation of response. Therefore, two different subspaces are separated from each other in each predictor space. One is the common subspace revealing the general regression features shared by all predictor spaces and the other is the specific subspace driven by the unique regression information that is more predictor space-dependent. This is achieved by solving a mathematical optimization problem, in which theoretical support is framed and the related statistical characteristics are analyzed. Its feasibility and performance are illustrated with both a simple numerical case and the experimental data from the literatures. From the cross-set viewpoint, the proposed approach provides a more meaningful characterization of the inherent predictor information for regression modeling. Further development and application potential are suggested.
1. INTRODUCTION Data-based multivariate calibration methods have been widely used to establish a quantitative relationship between process measurement (X) and quality property (Y). They are primarily a multivariate data-analytic technique for discovering the relationships between the variables in predictor space and those in response space. Accurate qualitative and quantitative calibration analysis may help avoiding cumbersome and costly chemical measurements. In practice, calibration modeling and analysis can often be accomplished with familiar, conventional statistical techniques,111 such as multiple linear regression (MLR), principal component regression (PCR), canonical correlation analysis (CCA), and partial least-squares (PLS). Among them, latent variable (LV) based methods play a dominant role, where fewer uncorrelated LVs are defined as linear combinations of measurement variables to comprehensively represent the original input space and used to build a quantitative regression relationship with the concerned quality properties. They often work well because measurement variable collinearity is typically strong and partly redundant. Thus, modeling the variable correlation pattern by LVs allows shrinking of the original data space into a lowerdimensional feature subspace. In PLS, LVs are selected that give maximal reduction in the XY covariance. In its practical applications, especially if the number of observation variables is large, one is often only interested in a subset of underlying LVs that account for most quality-associated predictor variability. This is generally interpreted by looking at the principal response-related distribution axes in predictor space, i.e., the regression weights, which can be readily obtained by eigenvalue decomposition solution on XY sample covariance. They describe a form of orthogonal rotation of the coordinate system, which transforms J correlated predictor variables into R (R < J) new feature variables, i.e., the so-called regression scores or LVs. Particularly, the transformed variables r 2011 American Chemical Society
are uncorrelated and ordered decreasingly by the indication of the associated distribution variances. Therefore, the regression weights can be deemed to be able to recover most of the variability in the internal predictor data table and meanwhile most quality-related. However, this analysis inherently restricts regression analysis to the case of single population sample, here called single-set regression analysis (SsRA). The subject of regression modeling and quality prediction arouses new issues and demands specific solutions when it refers to multiple predictor data tables. These multiple predictor data sets may be correlated with each other and share a common regression mode (object or variable) to a certain extent while they are employed for the explanation of the response variables. It is natural to distinguish the shared and specific regression information across sets. As a single-set analytic method, SsRA algorithm loses its efficiency and its extension to multiset case is desired. Recently, a set of different tools have been presented to extend single-set principal component analysis (SsPCA) by giving multiple data sets an integrative consideration. The well-known methods may include the 3-way factor analysis methods,12,13 generalized Procrustes analysis,1416 generalized canonical analysis (GCA),1722 common PCA (CPCA),2329 simultaneous component analysis (SCA),3037 et al. Generalized canonical analysis (GCA)1722 worked on three or more sets with object as common mode, Xi(N Ji) (i = 1, 2, ..., C). The common object structure, i.e., the similar variations across sets, were represented by canonical variates with maximal cross-set correlations. Common PCA (CPCA) algorithm2329 referred to multiple-source measurement data sharing variable mode, in which, each set of Received: July 25, 2011 Revised: December 8, 2011 Accepted: December 12, 2011 Published: December 12, 2011 1337
dx.doi.org/10.1021/ie201608f | Ind. Eng. Chem. Res. 2012, 51, 1337–1354
Industrial & Engineering Chemistry Research data, Xi(Ni J) (i = 1, 2, ..., C), may have a different number of observations but exactly the same number of variables. It made a basic assumption that the Xi-covariance matrices S1, ..., SC of C sample populations shared some common transformation coordinates although the size of each axis and its relative importance varied from set to set. This was equivalent to requiring all Si to be simultaneously reducible to diagonal form by the same orthogonal rotations. SCA-P31,32 relaxed the assumption where the common loading matrix can be directly figured out by performing the eigenvalue decomposition on the pooled sum of sample variance-covariance matrices. Actually, all these methods can be included into the family of SCA although they are based on quite different modeling ideas and employ different calculation ways to handle the problem. A structured overview of simultaneous component methods was offered by Van Deun et al.36 for an integrative analysis of coupled data. However, analogous to PCA for a single data set, the above-mentioned methods sought to figure out the common loadings across sets based on covariance structure. That is, the obtained PC transformation axes tended to recover the major within-set variability but may share little cross-set similarity. Therefore, they may run the risk of loosing the important low-variance loadings especially when a “true connection” lies in them. For multiple data sources sharing variable mode, the common structure refers to the similarity of their underlying variable correlations. Both SCA-P and CPCA require that the variance within sets of variables should be explained well by the common loadings and thus go beyond optimally describing the real interrelationship between sets of variables. To solve this problem, a multiset variable analysis method was proposed in our previous work38 to explore those really closely correlated structures over sets. Following the development of a theoretical algorithm along with its property analysis, the potentials of using the said algorithm to solve some meaningful practical problems were also addressed,3941 generating more interesting statistical explanations of the inherent correlation characteristics underlying multiple data sets. With respect to regression analysis, its multiset extension considers different issues and topics in comparison with multiset PCA (MsPCA), where there are response data to which multiple predictor sets should be related. Therefore, it investigates the “consensus” type of matrix based on 2-fold criterion. That is, the concerned cross-set relationship with respect to predictor spaces is constrained by the regression correspondence from predictors to the responses. Multiblock PLS (MBPLS) algorithms4246 have been theoretically developed and widely applied to solve different practical problems. The advantage of such blocking is mainly to allow for easier interpretation of the data by looking at smaller meaningful blocks and at the relationship between blocks so as to obtain the local information and global information simultaneously from the data. The main purpose of this study is to investigate ways of extending the conventional single-set regression analysis (SsRA) method to allow for simultaneous consideration of more than two sets of predictor variables. With multiple predictor spaces prepared, it is a natural idea to look for the cross-set interrelations and reveal their Y-related common structures for regression analysis. In the present work, a two-step statistical analysis procedure is designed to look at the common structure of multiset predictor spaces and simultaneously relate all of them to the response data space. Therefore, MsRA will simultaneously figure out how and how well the predictors and responses can relate to this “consensus”. This is called multiset regression
ARTICLE
analysis (MsRA), the generalization of SsRA methods. The multiset common structures, which are extracted from multiple predictor spaces and closely related with each other, can be the underlying regression weights or regression scores that are defined on the basis of the corresponding terms in SsRA. For both cases, a two-step optimization strategy is formulated with different objective functions. Then each underlying predictor space is divided into two parts, the common cross-set one and the unique inner-set one. Analyses are conducted to further comprehend the underlying meaning of the proposed solution. Also it is compared with the MBPLS algorithm, revealing their different objectives and calculation. The remainder of the paper is organized as follows. First, a simple preparatory theoretical support is framed by a revisit to the standard single-set regression analysis (SsRA) algorithm. Subsequently, the proposed two-step multiset regression analysis (MsRA) algorithm is mathematically formulated with statistical characteristics and property analyzed. Its suitability and rationality are highlighted. Then, on the basis of numerical and experimental data, the above-mentioned recognition and argument are effectively verified. Finally, conclusions are drawn in the last section.
2. SINGLE-SET REGRESSION ANALYSIS (SSRA) The observations for each set of variables are stored in two matrices, the predictor matrix, X(N Jx), and the response matrix, Y(N Jy) (where N is the number of observations, which should be the same in the predictor and response matrices; Jx and Jy indicate the variables in respective matrix that are allowed to be different). Calibration modeling algorithms are commonly designed to analyze the relationship between predictor and response variable sets. A particular community of LV methods111 has gained increasing popularity in chemometrics, such as PCR, PLS, CCA, and some variations on these methods. Their intended applications are those where the variables are highly correlated in both the predictor space X and the response space Y. In the present work, the two-step MsRA modeling method is developed on the basis of single-set PLS (SsPLS) and single-set CCA (SsCCA) algorithms, which are simply revisited as follows. The emphasis of PLS as a technique is not only on regression but also on uncovering the latent structure in both the X and Y spaces. The latent structure is made up of pairs of LVs, ti and ui (i = 1, 2, ..., A). As can be seen in the algorithm,47 the X-LVs for PLS are determined through the process of estimating the coefficients (w) for the linear combination of X variables. In PLS, these are commonly referred to as the weights. Also from the algorithm it can be seen that once the weight vector wi has been determined, all other quantities can be readily calculated from them. The objective function for the first weight vector, w1, is to maximize the sum of squared covariances of the vector Xw1 with the Y-LV which is also defined by the linear combination of Y variables: max ðw 1 T XT Yv1 Þ w1 ;v1 ( w1 T w1 ¼ 1 s:t: v1 T v1 ¼ 1
ð1Þ
A typical PLS algorithm is described by the classical NIPALS algorithm:47 (1) Mean-center and scale the X and Y matrices. (2) Set ui to be the first column of Y. 1338
dx.doi.org/10.1021/ie201608f |Ind. Eng. Chem. Res. 2012, 51, 1337–1354
Industrial & Engineering Chemistry Research
ARTICLE
(3) wi = XTui/(uiTui). Scale wi to be of length one. (4) ti = Xwi. (5) qi = Yti/(tiTti), scale qi to be of length one. (6) ui = Yqi/(qiTqi). (7) If convergence, then step (8), else step (3). (8) X-loading: pi = XiTti/(tiTti). (9) Xi+1=XitipiT and Yi+1=YitiqiT. (10) Return to step (2) for the next LV. Also, an alternative calculation way is to perform eigenvalue eigenvector decomposition.47 So the maximum for eq 1 is obtained at w1, the largest eigenvector of the matrix XTYYTX. To obtain further weights, the algorithm is repeated with deflated X and Y matrices. On the basis of the above optimization solution, in the SsPLS algorithm, one set of weight vectors, i.e., LV transformation axes, are extracted and ordered in terms of relationship with responses. Actually, from another viewpoint, it is equivalent to describing the X matrix the best possible way in terms of its relationship with Y. Therefore, most Y-related variability in the original X measurement space can be approximated along the first several distribution directions. The weight vectors w are defined for the deflated X matrices so that predictor LVs can be calculated as ti = Xiwi. The detailed calculation procedure can be referred to previous work.3,47 Different from SsPLS, SsCCA is to find linear combinations of both X and Y variables that are most highly correlated with each other. This leads to the same objective function but different constraints compared with eq 1: max ðv 1 T Y T Xw 1 Þ w1 ;v 1 ( w 1 T XT Xw1 ¼ 1 s:t: v1 T Y T Yv1 ¼ 1
ð2Þ
The solutions to this problem for w1 and v1 are the largest eigenvectors of the matrices (XTX)1XTY(YTY)1YTX and (YTY)1YTX(XTX)1XTY, respectively. The subsequent weights are the eigenvectors of the same matrix in order of decreasing eigenvalues. Also it satisfies the additional constraint that the components themselves are orthogonal (wkTXTXwj = 0, vkTYTYvj = 0, j¼ 6 k). The predictor LVs T can be calculated directly from the original T matrix by T = XW, which are orthogonal with each other. However, the solution depends heavily on whether or not each covariance matrix XTX is invertible. In practice, it is not uncommon that rank(XTX) < Jx. Therefore, the invertibility cannot be achieved and directly applying eigenvalue decomposition in the raw data space may lead to a rankdeficient problem. In above-mentioned regression analysis methods, the Yrelated variability is extracted on the basis of regression weights in the X-space, constructing the LV predictor subspace. They both take advantage of the structure of the Y variables but in a different way. Then, calling the regression of both X and Y onto the extracted predictor LVs (T), the basic regression model can be expressed as PT ¼ ðTT TÞ1 TT X Q T ¼ ðTT TÞ1 TT Y X ¼ TPT þ E Y ¼ TQ T þ F
ð3Þ
where T(N A) is the matrix containing the LVs used for quality prediction and interpretation (A is the number of retained LVs). P(Jx A) and Q(Jy A) are the loadings for X and Y, respectively, which tell how T contribute to the variation explanation in X and Y, respectively. E(N Jx) is the residual matrix of predictor variables after the explanation of T, and F(N Jy) is the quality residual matrix after the prediction of the same LVs T.
3. PROPOSED MSRA ALGORITHM Interest in the present work focuses on the extraction of common structures in multiple predictor data spaces from the Yrelated viewpoint. Two cases are considered referring to the different definitions of common regression structures. One is the common XY regression weights based on multiple predictor data sets and different response sets. That is, the predictor space and the associated response space are different over regression pairs {Xi, Yi} (i = 1, 2, ..., C). They have the same variable dimension but may present different observation samples. The other is the common XY regression scores based on multiple predictors and the single-response space where they share the same observation dimension and possibly different variable number. That is, different predictor spaces are associated with the same response space, where the predictor scores are expected to explain the same response information. Therefore, on the basis of the different definitions of common structures, the multiple predictor spaces can be composed of variables being measured over different sets of objects or objects being collected over different variables. To solve the two different problems, two versions of MsRA algorithms are designed. Their different statistical principles are analyzed and mathematical deductions are performed, each corresponding to certain optimization problem but with different objective functions and constraints. 3.1. Two-Step Feature Extraction Algorithm. It should be noted that the common structure in multiset regression analysis may be regression weights or regression scores, which are different analysis subjects and also reveal different meanings. Regression weights represent the correspondence between X and Y in variable correlations, whereas regression scores represent how the variations in X and Y are related. They result in different analysis objectives and thus different solutions. For the first case, across multiset predictor data, Xi(Ni J) (i = 1, 2, ..., C), the set-to-set interrelation refers to the common structure in variable correlations for interpretation of responses. It is represented by the regression weights in each regressor data space, which are similar or closely related to each other across sets. Therefore, the algorithm can be specifically called MsRAweight here. As mentioned before, in the SsRA calculation, the PLS weight is actually the largest eigenvector of XTYYTX, which is actually the linear combinations of covariance XTY. So the subweight, wij (j = 1, 2, ..., J), must lie in each X data space spanned by the observations and also relate well with the response. There exists linear combination coefficients aij = [ai1,j, ai2,j, ..., ain,j], such that w ij ¼ Xi TY i aij
ð4Þ
That is, each subbweight vector wij actually is a linear combination of the XiYi covariance. Here it should be pointed out that for an extreme case, multiple predictor spaces can be related to the same response (Y). In that way, predictors will share the same dimension along both objects and variables. 1339
dx.doi.org/10.1021/ie201608f |Ind. Eng. Chem. Res. 2012, 51, 1337–1354
Industrial & Engineering Chemistry Research
ARTICLE
Table 1. Comparison of MsRA-Weight and MsRA-Score Algorithms (a) First-Step Extraction MsRA-weight
MsRA-score i
i
regression scores for Xi-Y
regression weights for X -Y
analysis subject
C
objective
max
max
wg T wg ¼ 1 aiT ai ¼ 1
s:t:
(
constraint
s:t: C
∑ ðXiT YYT Xi Þwg i¼1
solution
C
∑ ðwg T XiT Yi ai Þ2 i¼1
∑ ðvT YT Xi ai Þ2 i¼1
(
¼ λg wg
ð
vT v ¼ 1 aiT ai ¼ 1
C
∑ YT Xi XiT YÞv ¼ λg v i¼1
(b) Second-Step Extraction MsRA-weight
objective
constraint
solution
MsRA-score
regression weights for W iT
analysis subject
regression scores for Ti-Y maxðv T Y T ̅Ti ai Þ
C
max
∑ ðwg T W̅ iT ai Þ2 i¼1
8 < wg T wg ¼ 1 s:t: i iT i : aiT W ̅ W ̅ a ¼ 1 C
∑ ðW̅ iT ðW̅ i W̅ iT Þ 1 W̅ i Þwg i¼1
(
YðY T YÞ 1 Y T ð
¼ λg w g
The degree of similarity of subbasis vectors should be measured in terms of “how close with each other over sets”. Different measure criteria could be used, such as distance, angle, and correlation. However, it would be complicated if all set-to-set interrelationships are simultaneously and directly evaluated. Here, the computation trick by Carroll17 is borrowed, in which a third-party canonical variate was introduced in their GCA algorithm. Then in our method, the similarity assessment of variable correlations over sets can be achieved through the introduction of a global and common weight vector, wg. It can be regarded as the supplementary and pseudo (C + 1)st subweight vector and should approximate all C subweights as close as possible. That is, these real common subweight vectors should be able to be comprehensively described and even substituted by the global weight because they are correlated with each other as close as possible, or speaking more exactly, as common as possible over sets. To figure out the common weights, based on the definition of subweight vector wij in each data set, a twostep extraction procedure is designed with the detailed mathematical deduction procedure shown in Appendix A. In the first step, the common weights are preparatorily computed from the original measurement data, and then in the second step, they can be further refined by enhancing their cross-set correlations. Different optimization objectives and constraints are used in the two steps as shown in eqs A1 and A16, which both come down to the simple analytic solutions of constrained optimization problems as shown in eqs A10 and A26. For the second case, the set-to-set interrelation refers to the common structure in underlying predictor variability for interpretation of response variability. It is represented by the regression scores in each predictor data space, which are similar or closely related with each other across sets. The algorithm can be
vT Y T Yv ¼ 1 aiT ̅TiT ̅Ti ai ¼ 1 C
∑ ̅Ti ð ̅TiT ̅Ti Þ 1 ̅TiT ÞYv ¼ λg Yv i¼1
specifically called the MsRA-score here. In PLS, the X-scores are the linear combinations of X variables, which are related with YLVs by maximizing the sum of their squared covariances. Actually, regression scores are built to find a proper compromise between two purposes: describing the set of explanatory variables and predicting the response ones. So the subscore, tij(j = 1, 2, ..., J), must lie in each Xi data space spanned by the variables and also relate well with the same response Y. There exists linear combination coefficients aij = [ai1,j, ai2,j, ..., ain,j], such that tij ¼ X i aij
ð5Þ
That is, each subscore vector tij is actually a linear combination of the original predictor variables. And the relation with Y-LV should be considered in the objective. The specific calculation procedure is deducted in Appendix B Distinguished by the concerned common regression mode (variable or object), two algorithms (MsRA-weight and MsRAscore) are comparatively shown in Table 1 with respect to analysis subject, objective, constraint and solution. 3.2. Comments and Property Analysis. 1. Comprehension of Common Regression Structures. Comparing SsRA and MsRA, the regression vectors are derived on the basis of different cost functions to achieve different application objectives. For example, in SsPLS algorithm, the underlying regression weights are extracted in terms of both good predictor data reconstruction and close relationship with quality data. The feature subspace spanned by the first scores thus covers most Y-related predictor variability. Comparatively, regarding multiple-set case, two scenarios are clarified, referring to two types of regression behaviors. For the MsRA-weight algorithm shown in Appendix A, the optimization problem makes use of an auxiliary (C + 1)st subweight vector that does not exist in any subset predictor 1340
dx.doi.org/10.1021/ie201608f |Ind. Eng. Chem. Res. 2012, 51, 1337–1354
Industrial & Engineering Chemistry Research
ARTICLE
space, i.e., the global one, and extracts the common subweight vectors distributed across sets by maximizing their cross-set correlations. That is, those most similar regression weights across sets are first extracted for regression analysis. Different common subspaces are thus enclosed in each data set, revealing most of cross-set similar regression behaviors. For any predictor measurement, the part that can be sufficiently explained by those common weight vectors is deemed to be set-common. The rest is regarded as the specific one. In this way, it enables one to decompose the intrinsic variable correlations hidden in multiple predictor data sets from both cross-set and Y-related viewpoints. For the MsRA-score shown in Appendix B, the set-common underlying regression scores are extracted in each predictor space that are all closely related with quality variables. So the auxiliary (C + 1)st subscore used in optimization problem actually makes use of quality variables. Then those most similar Y-related regression scores across sets are first extracted for regression analysis. Different common subspaces are thus enclosed in each predictor data set, revealing most of cross-set similar regression behaviors. For any predictor measurement, the part that can be sufficiently explained by those common scores is deemed to be set-wise common. The rest is regarded as specific one. In this way, it enables one to decompose the intrinsic variability hidden in multiple predictor data sets from both cross-set and Y-related viewpoints. In comparison with the MBPLS algorithm, the MsRA-score algorithm and MBPLS algorithm both consider the across-set relationship under the supervision of quality when they derive the regression scores from multiple predictor spaces. However, they use different objectives and get different solutions. The first-step MsRA-score algorithm actually works like the MBPLS algorithm with respect to the calculation. However, with the second-step modeling procedure, the MsRA-score solution can derive those really closely related regression scores across sets by excluding the influence of predictor variance. This will be illustrated in the case study section. 2. Consideration of the Extreme Cases. The proposed twostep feature extraction solution can come down to the usual SsRA in the case of the single set (C = 1), and also be generalized nicely to the case of any number of predictor data sets. For the MsRAweight shown in Appendix A, the global feature will converge to the SsPLS-weight vector. For the MsRA-score shown in Appendix B, the solution actually will converge to the combination of the SsPLS and SsCCA algorithms. It is possible to show the complete mathematical equivalence of this to the standard SsRA solution. In the MsRA-weight shown in Appendix A, for the single set, from eq A10, the suppositional (C + 1)st regression weights, i.e., the global ones W g, will come down to the eigenvectors (W 1(R J)) of the single sample covariance. Then in the second-step operation procedure, ∑Ci=1(W iT(W iW iT)1W i) is reduced to W 1TΛ1W 1 = (Λ1/2W 1)T(Λ1/2W 1) = W 1/TW 1/, where W 1/(R J) is the scaled form of W 1. By the eigenvector solution, W 1/ can be deemed to be the final global weights. From eq A27, corresponding to the jth global weights W 1,j/, the objective parameter will be calculated as
In MsRA-score shown in Appendix B, for the single set, from eq B10, the subset regression scores (T1) will come down to SsPLS LVs in the first-step modeling procedure. Then in the second-step operation procedure, from eq B25, the score is actually calculated by CCA with T1 as the input predictor space. So it is actually the SsPLS-CCA algorithm. 3. MsRA-1 and MsRA-2. Similar to SsPLS, on the basis of the number of response variables, the MsRA algorithm can be categorized into MsRA-1 and MsRA-2. When several response variables are available for calibration, the proposed method, called the MsRA-2 algorithm here, is easily understood. When only single-response variable is concerned, the proposed method, called the MsRA-1 algorithm here, should be given further explanation. For the MsRA-weight algorithm shown in Appendix A, the weight vector is defined as the linear combination of the XY covariance. So if the Y response only consists of a single variable, the XY covariance is actually a vector, and the linear combination coefficient aij is a scalar. So the subweight in the first-step extraction is actually proportional to the XY covariance vector. Although there is only one response variable, still multiple subweights can be calculated for each predictor space. Then those subweigths will be used in the second-step extraction procedure. For the MsRA-score algorithm shown in Appendix B, it is easy to understand that the combination coefficient (V) is a scalar and the Y-LV is actually proportional to the single-response variable in the first step, whereas in the second step, the Y-LV is actually the scaled response variable. In practical applications, when response variables are multivariate, either a separated model can be constructed for each response (MsRA-1) or properties are calibrated at once (MsRA2). The MsRA-1 and MsRA-2 models give different decompositions for predictor space and thus different regression structures. Generally, MsRA-2 gives better results than MsRA-1 if response variables are strongly correlated. 4. Orthogonality Analysis. For thye MsRA-weight algorithm shown in Appendix A, in the first step, although the global weights are not guaranteed to be orthogonal, the subweights are orthogonal by deflation calculation. That is, w i,j T w i,k = 0, j 6 ¼ k, where, subscript i denotes the predictor space; j and k stand for the subweight vector. The orthogonality between subweights is proved taking predictor space X ij for instance: Suppose that j < k. Then on the basis of the deflation relationship, one can get X ik ¼ Xij ti;j ðti;j T ti;j Þ1 ti;j T X ij Combined with eq A12, it is easy to get w i;j T w i;k
ð6Þ
It well agrees with the fact that full correlation relationship between the global and subweight vectors is achieved because they are actually identical in the single-set case.
1 ¼ w i;j T pffiffiffiffiffiffi Xik TY i Y i TXik wg λi;k
1 ¼ pffiffiffiffiffiffiw i;j T ðX ij ti;j ðti;j T ti;j Þ1 ti;j T Xij ÞT Y i Y i TX ik w g λi;k 1 ¼ pffiffiffiffiffiffiðti;j ti;j ÞT Y i Y i TX ik w g ¼ 0 ð8Þ λi;k
1T 1 1T 1 1 1;j ðW Þ W λj ¼ w̅ 1;j T W ̅ ̅ ̅ w̅ ̅ W 1 T 1 W ¼ w̅ 1;j T W ̅ ̅ w̅ 1;j ¼ 1
ð7Þ
However, in the second step, no deflation is imposed. It only guarantees that the global weight vectors are mutually orthogonal resulting from successive eigenvector decomposition as shown in eq A25, whereas the resulting subweights by orthogonal projection in eq A29 are not necessarily orthogonal. However, this 1341
dx.doi.org/10.1021/ie201608f |Ind. Eng. Chem. Res. 2012, 51, 1337–1354
Industrial & Engineering Chemistry Research condition in the second-step extraction procedure can be readily imposed by simple orthogonalization operation. That is, when a subweight is figured out, each predictor data set (W iT) can be deflated so that the following new subweights will be orthogonal to the previous ones. However, the deflation of W iT may arise the rank-deficient problem resulting from the calculation of W iT(W iW iT)1W i as shown in eq A26. Therefore, in the current work, the orthogonalization of subweight vectors is not pursued in the second-step extraction. For MsRA-score algorithm in Appendix B, the orthogonal constrains are imposed by the CCA algorithm itself. Then this algorithm guarantees that the global Y-LVs are mutually orthogonal as successive eigenvector solutions in eq B25. However, the corresponding subscores in each predictor space are not necessarily orthogonal. Also, the orthogonal condition can be readily imposed in the algorithm by simple orthogonalization operation. That is, when a subregression score is calculated corresponding to the global one, each predictor data set is deflated so that the following new subscores will be made to be orthogonal to the previous ones. Also the deflation will arouse the rank-deficient problem, which is thus not used here, either. 5. Subspace Spanned by Common Regression Structures in the Second-Step Calculation Is Part of That Enveloped in the First-Step Calculation. For the MsRA-weight algorithm shown in Xi H ̅W i ðX i GW i ÞT
Appendix A, by two-step extraction, each original predictor space (Xi) can be divided into two orthogonal subspaces: one common subspace obtained by projecting Xi onto those similar subweights, XiGWi, and the residual part, XiHWi, also called the specific subspace, which reveal the similarity and dissimilarity across sets, respectively: X i ¼ X i GW i þ X i HW i
X i ¼ X i G ̅W i þ X i H ̅W i
ð11Þ
Ti = Ti(TiTTi)1TiTUΛw (where, Λw is a diagonal matrix with elements to be 1/(λi,m)1/2, m = 1, 2, ..., R); therefore, the subregression scores (Ti) are actually calculated as the orthogonal projection onto Ti. Then, it is easy to get
ð12Þ
case, where two-set data are considered for simplicity. The performance of the proposed algorithm will be compared with that of MBPLS when it comes to interpreting the local variation (i.e., the regression scores) from different sets. Consider three source variables with the following distributions, which are shown in Figure 1: s1 ðkÞ ¼ 2 cosð0:08kÞ sinð0:06kÞ s2 ðkÞ ¼ sign½sinð0:3kÞ þ 3 cosð0:1kÞ s3 ðkÞ ¼ uniformly distributed noise in the range½1, 1
4. CASE STUDIES 4.1. Case Study 1. In this section, the performance of the proposed method is illustrated through a simple numerical
ð10Þ
To prove that the subspace XiGWi is part of XiGW i, it is equivalent to proving that XiGWi is orthogonal to XiHW i. From eq A29, WiT = W iT(W iW iT)1W iWgTΛw (where Λw is a diagonal matrix with elements 1/(λi,m)1/2, m = 1, 2, ..., R), therefore,
¼ X i T H̅Ti T GTi Xi i iT i 1 iT 1 ¼ X i T ðI T ̅ Þ T ̅ ÞTi ðTiT Ti Þ TiT X i ̅ ðT ̅ T i iT i 1 iT i iT i 1 iT 1 ¼ X i T ðI T ̅ Þ T ̅ ÞT ̅ Þ T ̅ ULt ðTiT Ti Þ TiT Xi T ̅ ðT ̅ T ̅ ðT ̅ T i iT i 1 iT i iT i 1 iT iT i 1 iT T ¼ X i ðT ̅ Þ T ̅ T ̅ Þ T ̅ ÞULt ðT T Þ T X i ̅ ðT ̅ T ̅ ðT ̅ T ¼0
Therefore, GTiXi is orthogonal to HTiXi and is part of GTiXi. From another viewpoint, for both algorithms shown in Appendices A and B, the second-step procedure can be regarded as a kind of postprocessing, and therefore, there is no concern about an overfitting problem. In the first step, we have preliminarily obtained R common vectors. In the second step, the number of retained common global vectors (R) is thus much more deterministic and should be not more than R.
ð9Þ
where GWi = WiT(WiWiT)1Wi is defined as the orthogonal projector onto the column space of WiT, and HWi = I GWi = I WiT(WiWiT)1Wi is defined as orthogonal component of that space or the antiprojector with respect to the column space of WiT. It is clear the two subspaces are orthogonal with each other because XiGWi(XiHWi)T = 0. The same expression is also recognized for the first-step analysis result:
¼ X i H ̅W i GW i X i T 1 iT i iT 1 i ¼ Xi ðI W ̅ Þ W ̅ ÞW iT ðW i W iT Þ W i X i T ̅ ðW ̅ W 1 iT i iT 1 i iT i iT 1 i ¼ Xi ðI W ̅ Þ W ̅ ÞW ̅ Þ W ̅ W g T Lw ðW i W iT Þ W i X i T ̅ ðW ̅ W ̅ ðW ̅ W 1 iT i iT 1 i iT i iT 1 i ¼ X i ðW ̅ Þ W ̅ W ̅ Þ W ̅ ÞW g T Lw ðW i W iT Þ W i Xi T ̅ ðW ̅ W ̅ ðW ̅ W ¼0
Therefore, XiGWi is orthogonal to XiHW i and is part of XiGW i. For the MsRA-score algorithm in Appendix B, similarly, to prove that the subspace GTiXi is part of GTiXi, it is equivalent to proving that GTiXi is orthogonal to HTiXi. From eq B27,
ðH̅Ti Xi ÞT GTi X i
ARTICLE
ð13Þ
From the three source signals, process data are generated by linear mixing as xT = s T A with the following two different 1342
dx.doi.org/10.1021/ie201608f |Ind. Eng. Chem. Res. 2012, 51, 1337–1354
Industrial & Engineering Chemistry Research
ARTICLE
Figure 1. Profiles of the three source signals.
mixing matrices: 2 0:86 0:55 0:17 6 6 0:32 A1 ¼ 4 0:79 0:65 0:67 0:46 0:28 2
2:79 6 A2 ¼ 6 4 2:86 1:17
1:13 1:07 2:06
3:90 2:01 0:80
3 0:33 0:89 7 0:12 0:97 7 5 0:27 0:74 1:12 0:29 0:97
3 0 7 1:01 7 5 2:07
ð14Þ
ð15Þ
A single quality variable is defined by y = 7s2, so the quality attribute is only associated with the third source signal. A thousand samples are generated, in which the first half will be used for model identification and the other half for model testing. Three cases are considered for model development and test: • Case I: no noise is imposed on the process data • Case II: normal-distributed noise with zero mean and 0.1 standard deviation is imposed on the process data • Case III: normal-distributed noise with zero mean and 0.5 standard deviation is imposed on the process data In this way, by comparing the extraction results for noiseless data and those for noisy data with different noise levels, the influence of noises can be checked. On the basis of the generated data, MBPLS algorithm and the two-step MsRA-score algorithm are compared with respect to the extraction of regression scores, which are shown in Figure 2. For case I shown in Figure 2a, compared with the original sources shown in Figure 1, MsRA can exactly extract the second source because the quality variable is only associated with it. Comparatively, the score extracted by MBPLS is more like the combination of the first source and second source. It tells us that MBPLS actually cannot extract the closely-quality-related one because it exploits the underlying predictor covariance. From case II to case III, noises are added and the noise levels are increased. As shown in Figure 2b,c, the extraction results by MsRA is deteriorated, which, however, still can uncover the general profile of the second source, giving better results than MBPLS. In contrast,
the profile of the extracted score by MBPLS still tends to be linear combination of the first two source signals, similar to the MBPLS solution for case I. In summary, the dominant source signal which significantly contributes to quality can be automatically identified as the score vector by the proposed two-step MsRA-score algorithm. This is because the two-step calculation focuses on the close relationship between process variables and quality characterized by correlation index. So the source that is really responsible for quality is figured out and those quality-irrelevant sources are excluded. In contrast, the MBPLS algorithm pays attention to the covariance between process variables and quality. It is possible that a higher covariance merely results from the larger variance of scores, which may not necessarily imply strong correlations. From another viewpoint, MBPLS and MsRA have different objectives and can be put into different applications to solve different practical problems. 4.2. Case Study 2. In this section, the proposed multiset regression analysis approach is tested with the data collected from the well-known Tennessee Eastman (TE) benchmark chemical process. The process is constructed by five major operation units: a reactor, a product condenser, a vaporliquid separator, a recycle compressor, and a product stripper. The details of this industrial process model have been described by Downs and Vogel48 and wide applications have been reported.4951 In this study, the simulation data are downloaded from the Web site http://brahms.scs.uiuc.edu. The data are collected on 52 input variables, which are composed of 41 measured variables and 11 manipulated variables. It involves 450 normal observations as predictor data, in which 150 samples are separated into each predictor space. Then, three sets of predictor data can be constructed, Xi(150 52) (i = 1, 2, 3). Corresponding to each predictor space, the response data set is generated, each with 10 variables as linear combinations of predictor variables, where the linear combination coefficients are random values with zero mean and unit variance. They construct three regression data pairs {Xi, Yi}. The first 100 samples are used for model identification and the rest for model testing. In this work, the illustrations are presented by combining training and testing results. First, each predictor data set and response data set are normalized to have zero mean and unit variance. Then two MsRA algorithms for the exploration of common regression structures are considered. For the MsRA-weight shown in Appendix A, the common regression weights among three regression pairs {Xi, Yi} are to be analyzed, which are to reveal the same regression relationship although the data pairs are different. On the basis of the proposed two-step extraction algorithm, those regression weights, revealing the XiYi correlations, are explored. In the first-step extraction, the regression weights with a certain cross-set similarity are prepared. Thirty subweight vectors are kept and substituted for the original data space in the second-step extraction procedure. As shown in Table 2a,b, to evaluate the extraction results in two different steps, the between-set correlations of subweights are calculated by taking the first ten for instance. Clearly, from the firststep to second-step results (from Table 2a,b), the correlations among regression subweights are strongly enhanced because the correlation index is used in the optimization objective for the second-step extraction. For example, for the common subweights in the first step, the maximal mean correlation value is only 0.74, which is achieved by the third one. Moreover, it should be noted that the subweights are not ordered exactly on the basis of the decreasing correlations. Comparatively, for the second-step 1343
dx.doi.org/10.1021/ie201608f |Ind. Eng. Chem. Res. 2012, 51, 1337–1354
Industrial & Engineering Chemistry Research
ARTICLE
Figure 2. Comparison of the regression scores extracted by MBPLS and MsRA-score algorithms for (a) case I, (b) case II, and (c) case III.
common subweights, all the first ten show strong correlations, which are almost up to 1. To see the difference, four figures have to be kept for the correlation index. Moreover, in each predictor space, the subweights are extracted in order, consistent with the decreasing correlations. Further, the common global weights recovered from two different steps are also compared. From the correlation coefficients displayed in Table 2c, each common global basis obtained by the two-step analysis correlates with multiple global weights obtained by only performing the first-step analysis. For example, the second global weight wg2 is notedly related with w g,1 through w g,4. This fact indicates that the second-step extraction may further process and refine the first-step analysis result so that those really closely related common weights are figured out. To illustratively demonstrate this, the common weights derived in two different steps by the proposed method are plotted in Figure 3
by taking the first four for instance. As shown in Figure 3a, generally, the first-step regression weights from different sets show similar trends, which, however, have nonignorable oscillations on most predictor variables. Comparatively, the second-step regression weights are quite similar with each other across sets, as shown in Figure 3b, which integrate the cross-set common weight information from the first-step extraction results. As described before, by MsRA-weight algorithm, the cross-set similarity and common behaviors are more focused on instead of the data variability within each predictor space. The extracted weights should be consistent across sets and may differ from those SsPLS regression weights more or less. To reveal their difference, the four common subweights are compared with regression weights obtained by performing SsPLS on each regression pair, {Xi, Yi}. Here, the dissimilarity is also evaluated 1344
dx.doi.org/10.1021/ie201608f |Ind. Eng. Chem. Res. 2012, 51, 1337–1354
Industrial & Engineering Chemistry Research
ARTICLE
Table 2. Analysis Results for MsRA-Weight Extraction (a) First-Step Extraction Result cross-set correlation no. of weights
W W
W1 W3
W2 W3
mean
1 2
0.66
0.64
0.75
0.68
0.58
0.78
0.72
0.69
3
0.66
0.77
0.80
0.74
4
0.41
0.60
0.67
0.56
5 6
0.54 0.29
0.38 0.49
0.52 0.54
0.48 0.44
7
0.45
0.32
0.44
0.40
8
0.37
0.50
0.35
0.41
9
0.28
0.47
0.37
0.37
10
0.30
0.33
0.13
0.25
1
2
(b) Second-Step Extraction Result cross-set correlation no. of weights
W1 W2
W1 W3
W2 W3
mean
1
0.9995
0.9993
0.9996
0.9995
2
0.9989
0.9991
0.9995
0.9992
3
0.9987
0.9991
0.9986
0.9988
4
0.9975
0.9975
0.9980
0.9977
5
0.9959
0.9959
0.9961
0.9960
6
0.9944
0.9912
0.9938
0.9931
7 8
0.9920 0.9903
0.9925 0.9896
0.9902 0.9889
0.9916 0.9896
9
0.9830
0.9809
0.9831
0.9823
10
0.9886
0.9737
0.9778
0.9800
(c) Comparison of Global Weights between the First-Step Extraction Result correlation coeff
and the Two-Step Result wg2 wg1
wg3
wg4 0.25
W g1
0.66
0.51
0.12
W g2
0.38
0.31
0.57
0.03
W g3
0.26
0.41
0.29
0.10
W g4
0.11
0.21
0.06
0.05
by calculating their inner product, i.e., the cosine value of their angle because they have unit module. From the results shown in Table 3a, the four common subweight vectors generally show different relationships with the first eight SsPLS weight vectors in each space. Moreover, as shown by the results in Table 3b, the cumulative distribution variances in each predictor space are calculated on the basis of the first four subweights in order and compared between SsPLS and MsRA algorithms. In MsRA, they are calculated as (∑(XiGWi)2)/(∑(Xi)2), where, GWi is the orthogonal projector onto WiT defined by eq 9. The variation in SsPLS can also be calculated in the same way. Generally, the firststep extraction result is similar with the SsPLS result (about 25% in two sets and 30% in the third set) because they both use covariance index in the objective. The second-step result is quite different from them because it more focuses on the cross-set correlations. So many fewer variations are captured in the second-step procedure in comparison with those by SsPLS or MsRA. For example, based on the first three common regression
Figure 3. First four common regression weights by MsRA-weight algorithm: (a) first-step extraction result; (b) second-step extraction result.
subweights, the explained distribution variations are less than 10% by second-step extraction, whereas the variations go up to 20% by SsPLS or first-step extraction. By four weight vectors, the second-step can only explain about half of the variations by SsPLS and first-step. Also, along each of the four common subweights for each method, variations are different more or less, which means the common subweights are of different significance for data reconstruction. As stated before, the common regression weights and common regression scores are different analysis subjects. For the MsRA-score in Appendix B, the three response data sets are put together, comprising a single-response space (Y*) with the same observation dimension but with variable dimension tripled. Then all three predictor data sets are associated with the same response space, which share the same observation dimension. The common regression scores are figured out and analyzed on the basis of the algorithm shown in Appendix B. There are 49 common regression subscores kept in the first step and then used for the second-step analysis. As shown in Table 4a,b, the extraction results from the first step and second step are comparatively evaluated on the basis of correlation index. Clearly, the secondstep common regression scores are more related than those from the first-step procedure. For example, the correlations between the first regression scores from any two predictor spaces are larger than 0.95 on the basis of second-step extraction and also high cross-set correlations are revealed by all ten scores generally. Comparatively, the averaged correlations between the first regression scores from any two predictor sets are only about 0.70 on the basis of the first-step extraction and quickly decreased after that. For example, when it refers to the second score, the maximal between-set correlation is only 0.43. Moreover, the global Y-LVs are also compared. As shown in Table 4c, they are quite different. To illustratively demonstrate this, the common scores derived in two different steps by the proposed method are plotted in Figure 4 taking the first four for instance. As shown in Figure 4a, generally, the first-step regression scores from different sets have similar trends to a certain extent, which, however, also show nonignorable oscillations. Comparatively, the second-step normalized regression scores are more similar with each other 1345
dx.doi.org/10.1021/ie201608f |Ind. Eng. Chem. Res. 2012, 51, 1337–1354
Industrial & Engineering Chemistry Research
ARTICLE
Table 3. Comparison of Analysis Results between SsPLS and the Proposed MsRA-Weight Algorithms (a) Regression Weights SsPLS weights correlation coeff c=1
c=2
c=3
w1
w2
w3
w4
w5
w6
w7
w8
wc,1
0.29
0.60
0.42
0.30
0.34
0.07
0.27
0.06
wc,2
0.37
0.42
0.32
0.08
0.32
0.25
0.01
0.32
wc,3
0.53
0.03
0.43
0.15
0.03
0.05
0.10
0.18
wc,4
0.17
0.03
0.05
0.03
0.02
0.05
0.34
0.12
wc,1 wc,2
0.65 0.31
0.32 0.27
0.30 0.56
0.34 0.10
0.34 0.05
0.01 0.36
0.06 0.30
0.05 0.14
wc,3
0.12
0.24
0.43
0.22
0.13
0.32
0.29
0.16
wc,4
0.23
0.11
0.27
0.14
0.09
0.42
0.18
0.25
wc,1
0.51
0.28
0.29
0.40
0.19
0.40
0.10
0.10
wc,2
0.50
0.32
0.36
0.12
0.39
0.05
0.11
0.27
wc,3
0.09
0.58
0.20
0.19
0.11
0.24
0.32
0.33
wc,4
0.26
0.01
0.03
0.04
0.06
0.56
0.13
0.02
(b) Predictor Variations (%) Explained in Different Data Sets method MsRA-weight set no. c=1
c=2
c=3
SsPLS
first-step
second-step
8.23
5.27
2.34
15.93
12.97
5.39
19.77
19.56
9.38
25.60
25.42
12.76
5.87 17.41
5.28 13.52
2.35 5.64
20.70
21.86
8.62
24.74
25.22
12.68
5.91
5.79
2.26
14.97
14.33
5.46
25.92
24.41
9.17
30.62
28.84
12.53
across sets, as shown in Figure 4b, by integrating the cross-set common information from the first-step extraction results. The extracted common subscores from two different steps should be consistent across predictor spaces and may differ from those single PLS regression scores more or less. To reveal their difference, the four common subscores are compared with regression scores obtained by performing SsPLS on each regression pair, {Xi, Y*}. Here, the dissimilarity is also evaluated by correlation analysis. From the results shown in Table 5a, the four common regression subscore vectors generally show different relationships with the first eight SsPLS scores in each predictor space. Moreover, as shown by the results in Table 5b, the cumulative distribution variances explained by the first four subscores in order are compared between the SsPLS and MsRA algorithms, revealing differences more or less. In MsRA, they are calculated as (∑(GTiXi)2)/(∑(Xi)2), where, GTi is the orthogonal projector onto Ti as used in eq 12. The variation in SsPLS can also be calculated in the sme way. Generally, by the first four regression scores, the variations captured by first-step MsRA are similar with SsPLS results, that is, about 25% in two sets and 30% in the third set. Comparatively, the second-step results show a
little lower amount, around 20% for all three sets. Also, along each of the four common subscores for each method, variations are different more or less, which means the common subscores are of different significance for data reconstruction.
5. CONCLUSIONS In this paper, a multiset regression analysis (MsRA) algorithm is designed by a two-step vector extraction, which, in a sense, is about how to give an integrative characterization of multiple predictor data sets while relating them to responses. That is, both XY and cross-X relationships are considered. Different from the conventional single-set regression analysis (SsRA), it emphasizes on the cross-set relationship and thus the common regression structures. Two versions of MsRA algorithms are formulated: on the one hand, the cross-set common regression weights reveal the common structure in regression variable mode; on the other hand, the cross-set common regression scores reveal the common structure in regression variation mode. Corresponding to each version, the predictor data space can be separated into two different subspaces, capturing common and 1346
dx.doi.org/10.1021/ie201608f |Ind. Eng. Chem. Res. 2012, 51, 1337–1354
Industrial & Engineering Chemistry Research
ARTICLE
Table 4. Analysis Results for MsRA-Score Extraction (a) First-Step Extraction Result cross-set correlation no. of weights
T T
1
0.61
0.70
0.79
0.70
2
0.43
0.37
0.38
0.39
3
0.30
0.34
0.42
0.35
4
0.36
0.34
0.29
0.33
5 6
0.14 0.21
0.13 0.12
0.18 0.31
0.15 0.21
7
0.34
0.39
0.40
0.38
8
0.23
0.09
0.17
0.16
9
0.16
0.04
0.03
0.03
10
0.07
0.07
0.05
0.06
1
T1 T3
2
T2 T3
mean
(b) Second-Step Extraction Result Cross-set correlation no. of weights
T1T2
T1T3
T2T3
mean
1
0.95
0.96
0.97
0.96
2
0.88
0.87
0.89
0.88
3
0.85
0.83
0.83
0.84
4
0.80
0.79
0.79
0.79
5
0.80
0.74
0.83
0.79
6
0.79
0.72
0.77
0.76
7 8
0.72 0.73
0.68 0.69
0.80 0.76
0.73 0.73
9
0.71
0.68
0.72
0.70
10
0.67
0.67
0.64
0.66
correlation coeff
(c) Comparison of Global Scores between the First-Step Extraction Result and the Two-Step Result ug2 ug3 ug1
ug1 ug2
0.44 0.08
0.45 0.22
ug3
0.03
ug4
0.03
unique variable correlations across sets for MsRA-weight algorithm or variations across sets for MsRA-score algorithm. Mathematical deduction and theoretical analysis are conducted by solving different constrained optimization problems. It presents the clear definition and conceptual simplicity in the formulation and solution of the problem. The effectiveness and feasibility of the proposed algorithm have been verified on the basis of numerical and experimental data. The proposed theoretical algorithm is expected to be generally applicable to a broad range of practical cases, providing more informative and interesting statistical explanations of the inherent characteristics underlying multiple regression data sets. On the basis of the algorithm development in this paper, it is hoped that the present methodology can become a useful statistical analysis tool to solve the practical problems in chemical engineering field. For example, modern industrial processes often possess various process variables, which may be collected from different sources and reveal different characteristics. Different sets of chemical variables may indicate different components, kinetics,
ug4
0.26 0.09
0.06 0.12
0.16
0.05
0.27
0.02
0.52
0.04
and operating conditions or compositions of chemical reactant. Also they have different influences on the quality properties. Despite the dissimilarity of process variations among different data sets, they also share a certain similarity. That is, some of the underlying qualityrelevant variations may remain the same and are shared across different spaces. To more efficiently utilize the available multiple variable sets, it is interesting to put the cross-set relationship into sight for feature extraction and decomposition. When the cross-set common and specific quality-relevant variations are modeled, different monitoring systems can then be developed to supervise them online respectively. Also based on the fault detection results for different variations, how to use them for fault diagnosis will also be an interesting future topic.
’ APPENDIX A MsRA-Weight Algorithm. On the basis of the definition of subweight vector wij in each predictor data set as shown in eq 4, a two-step MsRA-weight calculation procedure is designed as follows to extract the common regression weights across sets. 1347
dx.doi.org/10.1021/ie201608f |Ind. Eng. Chem. Res. 2012, 51, 1337–1354
Industrial & Engineering Chemistry Research
ARTICLE
With a Lagrange operator, the initial objective function can be expressed as an unconstrained extremum problem: Fðw g ;ai ;λÞ ¼
C
∑ ðwgT XiT Yi ai Þ2 λg ðwgT wg 1Þ i¼1 C
∑ λi ðaiT ai 1Þ i¼1
ðA2Þ
where λg and λi are constant scalars. Calculating the derivatives of F(wg,ai,λg,λi) with respect to wg, ai, λg, and λi and setting all equal to zero can yield the following mathematical expressions: C ∂F ¼ 2 ðjw g T XiT Y i ai jXiT Y i ai Þ 2λg w g ¼ 0 ∂w g i¼1
ðA3Þ
∂F ¼ 2jw g T XiT Y i ai jT Y iT Xi w g 2λi ai ¼ 0 ∂ai
ðA4Þ
wg T wg 1 ¼ 0
ðA5Þ
∑
aiT ai 1 ¼ 0
ðA6Þ T
iT
Left multiply eq A3 with wg and eq A4 with a , and the following relations can be derived in combination with eqs A5 and A6: 8 C > < ðw g T XiT Y i ai Þ2 ¼ λg ðA7Þ i¼1 > : ðw g T XiT Y i ai Þ2 ¼ λi
∑
Figure 4. Profiles of the first four common regression scores by MsRAscore algorithm: (a) first-step extraction result; (b) second-step extraction result.
Therefore, it tells us that λg is just the desired optimization objective. λi denotes the squared covariance between the subweight and the global weight vectors and it is clear they satisfy λg = ∑Ci=1λi. Therefore, they are called subobjectives here. Moreover, it should be noted that they are not necessarily the same from set to set; that is, the covariance information is not equivalently distributed over sets. Correspondingly, eqs A3 and A4 can be modified C
∑ i¼1
First-Step MsRA-Weight. During the first-step MsRA, we
define it in terms of finding a J-dimensional global weight vector (wg) together with subweights, the linear combinations of different XiYi covariances, with the cost function and certain constraints as max R ¼ max 2
s:t:
C
∑ ðwg
T iT i i 2
i¼1
X YaÞ
8 < wg T wg ¼ 1
ðA1Þ
1 pffiffiffiffiY iT X i w g ¼ ai λi
ðA8Þ ðA9Þ
From eq A8, the global weight vector should be the weighted average of all subweight vectors with λi/λg as the attached weights. Input eq A9 into eq A8: C
∑ ðXiT Yi YiT Xi Þwg ¼ λg wg i¼1
: aiT ai ¼ 1
The combination coefficient vector ai is set to unit length. Therefore, the subweight XiTYiai could have length more than unity so that (wgTXiTYiai)2 actually models the covariance information between subweight (XiTYiai) and global weight (wg). It is thus acknowledged that the objective function undesirably involves the module of subweight vector rather than the pure correlation analysis.
pffiffiffiffi λi X iT Y i ai ¼ λg wg
Q w g ¼ λg w g
ðA10Þ
This is a standard algebra problem. At the request of the maximal objective function value, i.e., the largest λg, analytically, the solution leads to the eigenvalue decomposition on the sum of subset covariances, Q = ∑Ci=1(XiTYiYiTXi). Clearly, to extract the first weight vector, the first-step MsRA-weight solution is actually similar to single-set PLS in which the first 1348
dx.doi.org/10.1021/ie201608f |Ind. Eng. Chem. Res. 2012, 51, 1337–1354
Industrial & Engineering Chemistry Research
ARTICLE
Table 5. Comparison of Analysis Results between SsPLS and the Proposed MsRA-Score Algorithms (a) Regression Scores SsPLS-score correlation coeff c=1
c=2
c=3
t2
t1
t3
t4
t5
t6
t7
t8
tc,1
0.18
0.20
0.08
0.79
0.05
0.27
0.32
0.00
tc,2
0.60
0.39
0.29
0.05
0.11
0.18
0.13
0.08
tc,3
0.59
0.15
0.12
0.05
0.26
0.01
0.25
0.30
tc,4
0.14
0.25
0.17
0.20
0.25
0.06
0.10
0.11
tc,1 tc,2
0.46 0.33
0.35 0.50
0.06 0.23
0.21 0.17
0.43 0.40
0.11 0.15
0.03 0.16
0.12 0.18 0.10
tc,3
0.22
0.30
0.08
0.48
0.12
0.33
0.10
tc,4
0.12
0.07
0.39
0.11
0.10
0.00
0.15
0.15
tc,1
0.01
0.83
0.27
0.03
0.04
0.25
0.09
0.11
tc,2
0.22
0.40
0.27
0.16
0.02
0.10
0.25
0.23
tc,3
0.16
0.05
0.10
0.05
0.39
0.17
0.33
0.06
tc,4
0.11
0.15
0.23
0.44
0.03
0.45
0.03
0.09
(b) Predictor Variations (%) Explained in Different Data Sets method MsRA-score set no.
first-step
SsPLS
c=1
c=2
c=3
11.97
6.50
6.61
17.21
16.54
14.44
21.37
21.67
19.44
27.44
25.38
21.73
10.71 19.60
13.58 18.24
7.07 13.88
23.39
23.38
17.55
27.39
26.84
20.07
6.63
14.97
12.55
22.23
22.24
16.63
26.82
26.51
18.86
32.07
30.12
22.90
weight vector is the eigenvalue decomposition on covariance,47 XTYY TX. Input eq A9 into eq A7, the subobjective parameter λi can be calculated: w g T XiT Y i Y iT Xi w g ¼ λi
ðA11Þ
On the basis of eq A9, the subweight vector can be calculated, revealing an obvious conversion relationship associated with the global weight: wi
¼ X iT Y i ai 1 XiT Y i Y iT Xi w g ¼ pffiffiffiffi λi
ðA12Þ
Before extracting the second common subweight vectors, the predictor space (Xi) is deflated: ti ¼ Xi w i X i ¼ X i ti ðti T ti Þ1 ti T X i
second-step
ðA13Þ
Then the above procedure will be repeated and the following new subweights will be orthogonal to the previous ones. In orrder, the
R number of global weight vectors can be derived, resulting in the same number of subweight vectors correspondingly calculated by eq A12 for each data set that is mutually orthogonal. The number of subweights is unified over sets. This decomposition summarizes and compresses each covarying regression information XiTYi into a new subspace spanned by R subweights, W i(R J). However, the maximization of covariance information, as calculated by the inner product between the global and subweight vectors, may not necessarily imply strong correlations. Also it is possible that a higher covariance merely results from the larger modules of subbasis vectors. It is likely that some low-module bases that can reveal the common information in the underlying variable correlations over different data sources are overlooked. This, thus, may lead to difficulty and misapprehension in interpreting the true cross-set predictor-response regression relationship. It even may result in some pseudocommon cross-set subweights that are quite different over sets and thus cannot be comprehensively represented by the so-called global one (wg). Second-Step MsRA-Weight. The above mathematical expression indicates that eigenvector solution based on the stacked 1349
dx.doi.org/10.1021/ie201608f |Ind. Eng. Chem. Res. 2012, 51, 1337–1354
Industrial & Engineering Chemistry Research
ARTICLE
subset covariances may not derive the common weight vectors across sets. To get the cross-set common subweights that are really closely correlated, correlation analysis index should be used instead of the covariance index. In comparison with the optimization objective function and constraints shown in eq A1, the second-step weight extraction is designed by constructing and solving a different optimization problem. It is implemented on the basis of the first-step analysis result (W i(R J)) and the aim is to maximize the mean square correlations by employing the same computation trick; that is, a global weight vector is introduced as the third-party weight: max R 2 ¼ max ¼ max
∑ r2ðwg ;W̅ iT ai Þ
i¼1 C
2 iT ðw T W ̅ ai Þ
g 2 iT 2 ∑ W a i i¼1 w
̅
ðA14Þ
Because all weight vectors, involving global and subglobal ones, have been set to unit length priorly, the correlation coefficient index, r(wg, W iTai), can come down to the simple inner product operation wgT, W iTai. A constrainted optimization problem can thus be formulated: C
∑
T
ðA19Þ
wg T wg 1 ¼ 0
ðA20Þ
ðA16Þ
iT
Left multiply eq A18 with wg and eq A19 with a , and the following relations can be derived in combination with eqs A20 and A21, respectively: 8 C > 2 iT < ðw g T W ̅ ai Þ ¼ λg ðA22Þ i¼1 > 2 iT : ðw g T W ̅ ai Þ ¼ λi
∑
where λg is the desired objective and λi denotes the subobjective parameter, satisfying λg = ∑Ci=1λi. Correspondingly, eqs A18 and A19 can be further modified C
∑ i¼1
pffiffiffiffi iT λi W ̅ ai ¼ λg w g
ðA23Þ
pffiffiffiffi i iT λi W ̅ W ̅ ai
ðA24Þ
i W ̅ wg ¼
From eq A24, then ðA25Þ
Input eq A25 into eq A23: C
iT i iT 1 i ðW ̅ Þ W ̅ Þw g ¼ λg w g ̅ ðW ̅ W ∑ i¼1
C
∑ ðwg T W̅ iT ai Þ2 λg ðwgT wg 1Þ i¼1 i iT i λi ðaiT W ̅ W ̅ a 1Þ ∑ i¼1
ðA21Þ
1 i iT 1 i ai ¼ pffiffiffiffiðW ̅ Þ W ̅ wg ̅ W λi
Therefore, the optimization search rule comes down to such a scenario: in a certain unknown subspace, a global weight wg should be figured out so that it would approximate every subweight wg as closely as possible. The maximal correlation between any two unit vectors is actually the cosine value of the angle between them. Therefore, the problem shown in eq A16 is actually equivalent to searching any two vectors that have the minimal angle, which is also the minimal angle between two subspaces in which wg and wi lie, respectively. This explains the mathematical meaning of the optimization problem from the geometrical viewpoint. With a Lagrange operator, the original optimization problem can be replaced by an unconstrained extremum problem:
C
∂F iT i i iT i ¼ 2jw g T W ̅ ai jW ̅ W ̅ a ¼0 ̅ wg 2λi W ∂ai
i iT i aiT W ̅ W ̅ a 1 ¼ 0
iT i 2
max R ¼ max ðw g W ̅ aÞ i¼1 8 < wg T wg ¼ 1 s:t: i iT i : aiT W ̅ W ̅ a ¼1
ðA18Þ
∑
T
Meanwhile, the objective function is supposed to be subject to the following constraints: 8 < wg T wg ¼ 1 ðA15Þ i iT i : aiT W ̅ W ̅ a ¼1
Fðw g ;ai ;λÞ ¼
C ∂F iT iT ¼ 2 ðjw g T W ̅ ai jW ̅ ai Þ 2λg w g ¼ 0 ∂w g i¼1
C
g
2
equations:
ðA17Þ
Sw g ¼ λg w g
ðA26Þ
Therefore, the optimization problem finally leads to a simple analytical solution; i.e., pg should be the eigenvector of S = ∑Ci=1(W iT(W iW iT)1W i) corresponding to the largest eigenvalue λg. Here, it should be pointed out that because subweights (W i) extracted from the first-step are mutually orthogonal, W iW iT is invertible. Input eq A25 into eq A22, and the subobjective value λi can be calculated: iT i iT 1 i wg T W ̅ ðW ̅ Þ W ̅ w g ¼ λi ̅ W
ðA27Þ
Considering that W iT(W iW iT)1W i = W iT(W iT)+ actually stands for the orthogonal projector onto the column space of W iT, it can be inferred that the parameter value λi is actually the inner product between wg and its projection onto the ith predictor space. Further, eq A27 can be modified as iT i iT 1 i iT i iT 1 i wg T W ̅ ðW ̅ Þ W ̅ W ̅ ðW ̅ Þ W ̅ w g ¼ λi ̅ W ̅ W T iT i iT 1 i iT i iT 1 i ðW ̅ Þ W ̅ w g Þ ðW ̅ Þ W ̅ w g Þ ¼ λi ̅ ðW ̅ W ̅ ðW ̅ W
where λg and λi are constant scalars. Calculating the derivatives of F(wg,ai,λg,λi) with respect to wg, i a , λg, and λi and setting all equal to zero can yield the following
ðA28Þ 1350
dx.doi.org/10.1021/ie201608f |Ind. Eng. Chem. Res. 2012, 51, 1337–1354
Industrial & Engineering Chemistry Research
ARTICLE
Therefore, the parameter λi is also the module of the orthogonal projection of wg onto the column space of W iT. Then, from eq A25, each subweight vector can be calculated: 1 iT i iT 1 i iT wi ¼ W ̅ ðW ̅ Þ W ̅ wg ̅ ai ¼ pffiffiffiffiW ̅ W λi
ðA29Þ
When eqs A28 and A29 are combined, for any given wg, the corresponding wi is actually the unit-length orthogonal projection of wg onto each space spanned by W i. Moreover, from eq A23, the global weight is the weighted sum of all subweights: pffiffiffiffi C λi wi ðA30Þ wg ¼ i ¼ 1 λg
∑
In turn, the second global weight vector orthogonal to the first is given by the second eigenvector of S and so on. Thus, one can determine in this way as many wg as the rank of S, which is deemed to be less than the sum of the rank of W iT(W iW iT)1W i. Finally, the retained R number of weight vectors can construct a global weight subspace Wg(R J). Correspondingly, C subweight subspaces, Wi(R J), are also derived, which are actually the projection from Wg(R J) onto W iT based on eq A29. Considering that for any given wg, the maximum value of r2(wg,XiTYiai) is given by the angle between wg and wi, this optimization solution actually comes down to deriving a global weight subspace that simultaneously closely correlates with multiple subweight subspaces, i.e., the minimal space angle between them. Unlike the first-step weight extraction, the second-step weight extraction algorithm inherently excludes the effect of module length of each subweight vector and directly maximizes the sum of their correlations. Moreover, the first-step analysis result prepares a good initial platform for it. If the second-step basis extraction is directly implemented on the original measurement variables, it hereby depends heavily on whether or not each covariance matrix YiTXiXiTYi is invertible as shown in eq A26. In practice, it is not uncommon that rank(YiTXiXiTYi) < Jy, especially when multiple response variables are collinear with each other. Therefore, the invertibility cannot be achieved and directly applying eigenvalue decomposition to the raw data spaces may lead to a rank-deficient problem. In the current work, this problem is readily solved by implementing the two-step optimization problem. First, by the first-step calculation, we can build a preparatory subweight subspace (W i(R J)) as a substitute for the original predictor data. Then, on the basis of first-step analysis result, the second-step weight extraction is implemented to further condense and refine it. In this operation process, the subweight subspace obtained from the first step, W i(R J), guarantees the invertibility of Gram matrix W iW iT because the subweights are mutually orthogonal. Correspondingly, the global weights can be readily derived by eigenvalue solution of ∑Ci=1(W iT(W iW iT)1W i) in the second step. Then a parsimonious common subweight subspace can be obtained within each predictor space, revealing the regression relationship shared across sets. In this way, it could get rid of the pseudocorrelation information, purify subweight vectors, and thus approximate the real common global regression weights.
’ APPENDIX B MsRA-Score Algorithm. On the basis of the definition of subscore vector tij in each data set as shown in eq 5, a two-step
MsRA-score calculation procedure is designed as follows to extract the common regression scores over data sets. First-Step MsRA-Score. During the first-step MsRA, we define it in terms of finding different linear combinations of the variables comprising each of C collective predictor data sets and making them all closely related with or explain the same score in Y space. The cost function and certain constraints C
∑
ðv T Y T Xi ai Þ2 max R 2 ¼ max i¼1 ( vT v ¼ 1 s:t: aiT ai ¼ 1
ðB1Þ
The Y-score Yv serves as the global score to which all Xi-scores in different Xi spaces contribute. The combination coefficient vector ai is set to unit length. Therefore, the subscore Xiai could have length more than unity so that (vTYTXiai)2 actually models the covariance information between subscore (Xiai) and global score (Yv). It is thus acknowledged that the objective function undesirably involves the modules of subscore vector and global scores rather than the pure correlation analysis. With a Lagrange operator, the initial objective function can be expressed as an unconstrained extremum problem: Fðv;ai ;λg ;λi Þ ¼
C
ðvT Y T X i ai Þ2 λg ðvT v 1Þ ∑ i¼1
C
λi ðaiT ai 1Þ ∑ i¼1
ðB2Þ
where λg and λi are constant scalars. Calculating the derivatives of F(v,ai,λg,λi) with respect to v, ai, λg, and λi and setting all equal to zero can yield the following mathematical expressions: C ∂F ¼ 2 ðjvT Y T Xi ai jY T X i ai Þ 2λg v ¼ 0 ∂v i¼1
ðB3Þ
∂F ¼ 2jvT Y T Xi ai jT XiT Yv 2λi ai ¼ 0 ∂ai
ðB4Þ
vT v 1 ¼ 0
ðB5Þ
aiT ai 1 ¼ 0
ðB6Þ
∑
Left multiply eq B3 with wgT and eq B4 with aiT, and the following relations can be derived in combination with eqs B5 and B6: 8 C > < ðv T Y T Xi ai Þ2 ¼ λg ðB7Þ i¼1 > : ðv T Y T Xi ai Þ2 ¼ λi
∑
Therefore, it tells us that λg is just the desired optimization objective. λi denotes the squared covariance between the subbasis and the global basis vector and it is clear they satisfy λg = ∑Ci=1λi. Therefore, they are called subobjectives here. Moreover, it should be noted that they are not necessarily the same from set to set; that is, the covariance information is not equivalently distributed over sets. 1351
dx.doi.org/10.1021/ie201608f |Ind. Eng. Chem. Res. 2012, 51, 1337–1354
Industrial & Engineering Chemistry Research
ARTICLE
Correspondingly, eqs B3 and B4 can be modified C
∑ i¼1
pffiffiffiffi λi Y T X i ai ¼ λg v
1 pffiffiffiffiXiT Yv ¼ ai λi
ðB8Þ ðB9Þ
Input eq B9 into eq B8: ð
C
∑ YT Xi XiT YÞv ¼ λg v i¼1
Q v ¼ λg v
ðB10Þ
This is a standard algebra problem. At the request of the maximal objective function value, i.e., the largest λg, analytically, the solution (the first Y-weight vector) leads to the eigenvalue decomposition on the sum of subset covariances, Q = ∑i=1 C T i iT Y X X Y. On the basis of the above calculation, the first Y-weight corresponding to multiple predictor spaces (Xi) is extracted. Clearly, to extract the first Y-weight vector, the first-step MsRAscore solution is actually similar to single-set PLS in which the first Y-weight vector is the eigenvalue decomposition on covariances,47 YTXXTY. Input eq B9 into eq B7, the subobjective parameter λi can be calculated: vT Y T X i XiT Yv ¼ λi
ðB11Þ
On the basis of eq B9, the subweight vector for each predictor data set (Xi) can be calculated and normalized. Therefore, the subscores are calculated, responding to the same Y-weight vector (v) or Y-score (u): 1 1 ti ¼ Xi ai ¼ pffiffiffiffiXi X iT Yv ¼ pffiffiffiffiXi X iT u λi λi
cross-set subscores that are quite different over sets and thus do not resemble the so-called global one (Yv). Second-Step MsRA-Score. As analyzed, the above solution based on the stacked subset covariances may not derive the common score vectors across sets. To get the cross-set common subscores that are really closely correlated, the correlation analysis index should be used instead of the covariance index. In comparison with the optimization objective function and constraints shown in eq B1, the second-step extraction is designed by constructing and solving a different optimization problem. It is implemented on the basis of the first-step analysis result (Ti(N R)) and the aim is to maximize the mean square ̅ correlations by employing the same computation trick; that is, a global score vector is introduced as the third-party score: max R 2 ¼ max ¼ max
ððYvÞT T̅ i ai Þ2
ðB14Þ
Because all score vectors, involving global and subglobal ones, have been set to unit length prior, the correlation coefficient index, r(Yv,Tiai), can come down to the simple inner product operation vTYTTiai. A constrainted optimization problem can thus be formulated:
ðB12Þ
C
∑
i 2 ðv T Y T T max R 2 ¼ max ̅ ai Þ i¼1 ( v T Y T Yv ¼ 1 s:t: aiT T̅ iT T̅ i ai ¼ 1
ðB13Þ
Then the above procedure will be repeated for the extraction of common scores. In this way, the following new subscores will be orthogonal to the previous ones. It should be noted that if Y is a single variable, then the first Y-score vector can be directly calculated as the eigenvector of ∑Ci=1YYTXiXiT. In order, the R number of Y-score vectors can be derived on the basis of eq B10, resulting in the same number of subscore vectors based on eq B12, which are mutually orthogonal in each predictor space. The number of subscores is unified over sets. This decomposition summarizes and compresses the predictor variation information Xi into a new subspace spanned by R subscores, Ti(N R). However, the maximization of covariance information, as calculated by the inner product between the global and subscore vectors, may not necessarily imply strong correlations. It is possible that a higher covariance merely results from the larger modules of subscore vectors. It is likely that some low-module scores that can reveal the common Y-related predictor variation information across sets are overlooked. This, thus, may lead to difficulty and misapprehension in interpreting the true cross-set predictor interrelations with respect to the explanation for response variables. It even may result in some pseudocommon
C
∑ 2 ̅ i ai k2 i ¼ 1 kYv k kT
Meanwhile, the objective function is supposed to be subject to the following constraints: ( vT Y T Yv ¼ 1 ðB15Þ iT i i aiT T ̅ T ̅ a ¼1
Before extracting the second common subscore in each data space, the predictor space (Xi) is deflated by the first subscore: Xi ¼ Xi ti ðti T ti Þ1 ti T Xi
C
∑ r2ðYv;T̅ i ai Þ i¼1
ðB16Þ
Therefore, the optimization search rule comes down to such a scenario: in Y space, a global score Yv is used as reference, which would approximate every subscore in Xi space as closely as possible. The maximal correlation between any two unit vectors is actually the cosine value of the angle between them. Therefore, the problem shown in eq B16 is actually equivalent to searching any two vectors that have the minimal angle, which is also the minimal angle between two subspaces in which Yv and ti lie, respectively. This explains the mathematical meaning of the optimization problem from the geometrical viewpoint. With a Lagrange operator, the original optimization problem can be replaced by an unconstrained extremum problem: Fðv;ai ;λg ;λi Þ ¼
C
i 2 ðvT Y T T ̅ ai Þ λg ðvT Y T Yv 1Þ ∑ i¼1 C
iT i i λi ðaiT T ̅ T ̅ a 1Þ ∑ i¼1
ðB17Þ
where λg and λi are constant scalars. Calculating the derivatives of F(v,ai,λg,λi) with respect to v, ai, λg, and λi and setting all equal to zero can yield the following 1352
dx.doi.org/10.1021/ie201608f |Ind. Eng. Chem. Res. 2012, 51, 1337–1354
Industrial & Engineering Chemistry Research
ARTICLE
equations: C ∂F ¼ 2 ðjvT Y T T̅ i ai jY T T̅ i ai Þ 2λg Y T Yv ¼ 0 ∂v i¼1
ðB18Þ
∂F i i iT i i ¼ 2jvT Y T T ̅ Yv 2λi T ̅ ai jT ̅ T ̅ a ¼0 ∂ai
ðB19Þ
vT Y T Yv 1 ¼ 0
ðB20Þ
iT i i aiT T ̅ T ̅ a 1 ¼ 0
ðB21Þ
∑
Left multiply eq B18 with vT and eq B19 with aiT, and the following relations can be derived in combination with eqs B20 and B21, respectively: 8 C > i 2 < ðvT Y T T ̅ ai Þ ¼ λg ðB22Þ i¼1 > i 2 : ðvT Y T T ̅ a i Þ ¼ λi
∑
where λg is the desired objective and λi denotes the subobjective parameter, satisfying λg = ∑Ci=1λi.Correspondingly, eqs B18 and B19 can be further modified C
∑ i¼1
pffiffiffiffi λi Y T Ti ai ¼ λg Y T Yv
ðB23Þ
pffiffiffiffi i λi TiT T ̅ ai
ðB24Þ
iT T ̅ Yv ¼
Input eq B24 into eq B23: YðY T YÞ1 Y T ð Su ¼ λg u
C
∑
i¼1
i iT i 1 iT T ̅ ðT ̅ Þ T ̅ ÞYv ¼ λg Yv ̅ T
ðB25Þ
ðB26Þ
On the basis of eq B24, the subweight vector for each predictor data set (Xi) can be calculated and normalized. Then the subscores are calculated, responding to the same Y-score (u): 1 i iT i 1 iT i ti ¼ T ̅ ðT ̅ ai ¼ pffiffiffiffiT ̅ T ̅ Þ T ̅ u λi
’ AUTHOR INFORMATION Corresponding Author
*Tel:+852-23587136. Fax: +852-23580054. E-mail: kefgao@ ust.hk.
’ ACKNOWLEDGMENT This work is supported in part by the China National 973 program (2009CB320603) and RGC-NSFC joint project under project number N_HKUST639/09. ’ REFERENCES
This is a standard algebra problem. At the request of the maximal objective function value, i.e., the largest λg, analytically, the solution (the first Y-score) leads to the eigenvalue decomposition on S = Y(YTY)1YT(∑Ci=1Ti(TiTTi)1TiT). On the basis of the above calculation, the first Y-score, which is corresponding to multiple predictor spaces (Xi), is extracted. Input eq B24 into eq B22, the subobjective parameter λi can be calculated: i iT i 1 iT vT Y T T ̅ ðT ̅ Þ T ̅ Yv ¼ λi ̅ T
second-step extraction is directly implemented on the original measurement variables, it hereby depends heavily on whether or not each covariance matrix XiTXi is invertible, as shown in eq B25. In practice, it is not uncommon that rank(XiTXi) < Jx. Therefore, the invertibility cannot be achieved and directly applying eigenvalue decomposition in the raw data space may lead to a rank-deficient problem. In the current work, this problem is readily solved by implementing the two-step optimization problem. First, by the first-step calculation, we can build a preparatory subscore subspace (Ti(N R)) in each predictor space as a substitute for the original measurement data. Then, on the basis of first-step analysis result, the second-step extraction is implemented to further condense and refine it. In this operation process, the subscore subspace obtained from the first step, Ti(N R), guarantees the invertibility of covariance matrix TiTTi because the subscores are mutually orthogonal. Then a parsimonious common regression score subspace can be obtained within each predictor space.
ðB27Þ
Moreover, from eq B23, the global score is actually the orthogonal projection of the weighted sum of all subscores onto the column space of Y. In turn, the following global scores are calculated by the second eigenvector of S in eq B25 and so on. They can construct a global score subspace U(N R) and thus different predictor subscore spaces, Ti(N R). Unlike the firststep extraction, the second-step extraction inherently excludes the effect of module length of each subscore vector and directly maximizes the sum of their correlations. Moreover, the first-step analysis result prepares a good initial platform for it. If the
(1) Martens, H.; Naes, T. Multivariate Calibration, 2nd ed.; Wiley: Chichester, U.K., 1994. (2) Burnham, A. J.; Viveros, R.; MacGregor, J. F. Frameworks for latent variable multivariate regression. J. Chemom. 1996, 10, 31. (3) Doyal, B. S.; MacGregor, J. F. Improved PLS algorithms. J. Chemom. 1997, 11, 73. (4) Cserhati, T.; Kosa, A.; Balogh, S. Comparison of partial leastsquare method and canonical correlation analysis in a quantitative structure-retention relationship study. J. Biochem. Biophys. Methods 1998, 36, 131. (5) Brereton, R. G. Introduction to multivariate calibration in analytical chemistry. Analyst 2000, 125, 2125. (6) Anderson, T. W. Canonical correlation analysis and reduced rank regression in autoregressive models. Ann. Stat. 2002, 30, 1134. (7) Kleinbaum, D. G.; Kupper, L. L.; Muller, K. E.; Nizam, A. Applied Regression Analysis and Other Multivariable Methods, 3rd ed.; Wadsworth Publishing Co. Inc.: Florence, KY, 2003. (8) Hardoon, D. R.; Szedmak, S.; Taylor, J. S. Canonical correlation analysis: An overview with application to learning methods. Neural. Comput. 2004, 16, 2639. (9) Yu, H. L.; MacGregor, J. F. Post processing methods (PLS-CCA): Simple alternatives to preprocessing methods (OSC-PLS). Chemom. Intellig. Lab. Syst. 2004, 73, 199. (10) Ergon, R. Reduced PCR/PLSR models by subspace projections. Chemom. Intell. Lab. Syst. 2006, 81, 68. (11) Yamamoto, H.; Yamaji, H.; Fukusaki, E.; Ohno, H.; Fukuda, H. Canonical correlation analysis for multivariate regression and its application to metabolic fingerprinting. Biochem. Eng. J. 2008, 40, 199. (12) Kroonenberg, P.; De Leeuw, J. Principal component analysis of three-mode data by means of alternating least squares algorithms. Psychometrika 1980, 45, 69. (13) Naes, T.; Kowalski, B. Predicting sensory profiles from external instrumental measurements. Food Quality and Preference 1989, 1, 135. 1353
dx.doi.org/10.1021/ie201608f |Ind. Eng. Chem. Res. 2012, 51, 1337–1354
Industrial & Engineering Chemistry Research (14) Kristof, W.; Wingersky, B. Generalizations of the orthogonal Procrustes rotation procedure to more than two matrices. Proceedings of the 79th Annual Convention of the American Psychological Association; APA: Washington, DC, 1971; Vol. 6, p 81. (15) Gower, J. C. Generalized Procrustes analysis. Psychometrika 1975, 40, 33. (16) Tenberge, J. M. F. Orthogonal Procrustes rotation for two or more matrices. Psychometrika 1977, 42, 267. (17) Carroll, J. D. Generalization of canonical correlation analysis to three or more sets of variables. Proceeding of the 76th convention of the American Psychological Association; APA: San Francisco, CA, 1968; Vol. 3, p 227. (18) Kettenring, J. R. Canonical analysis of several sets of variables. Biometrika 1971, 58, 433. (19) Tishler, A.; Lipovetsky, S. Canonical correlation analyses for three data sets: a unified framework with application to management. Comput. Oper. Res. 1996, 23, 667. (20) Goria, M. N.; Flury, B. D. Common canonical variates in k independent groups. J. Am. Stat. Assoc. 1996, 91, 1735. (21) Nielsen, A. A. Multiset canonical correlations analysis and multispectral, truly multitemporal remote sensing data. IEEE Trans. Image Process. 2002, 11, 293. (22) Dahl, T.; Naes, T. A bridge between Tucker-1 and Carroll’s generalized canonical analysis. Comput. Stat. Data Anal. 2006, 50, 3086. (23) Krzanowski, W. J. Principal component analysis in the presence of group-structure. Appl. Stat.—J. R. Stat. Soc., Ser. C 1984, 33, 164. (24) Flury, B. N. Common principal components in K-groups. J. Am. Stat. Assoc. 1984, 79, 892. (25) Flury, B. N.; Gautschi, W. An algorithm for simultaneous orthogonal transformation of several positive definite symmetricalmatrices to nearly diagonal form. Siam J. Sci. Stat. Comput. 1986, 7, 169. (26) Flury, B. K. 2 generalizations of the common principal component model. Biometrika 1987, 74, 59. (27) Krzanowski, W. J. Between-group analysis with heterogeneous covariance matrices-the common principal component model. J. Classif. 1990, 7, 81. (28) Krzanowski, W. J. Ordination in the presence of group-structure, for general multivariate data. J. Classif. 1994, 11, 195. (29) Neuenschwander, B. E.; Flury, B. D. Common principal components for dependent random vectors. J. Multivar. Anal. 2000, 75, 163. (30) van de Geer, J. P. Linear relations between K sets of variables. Psychometrika 1984, 49, 79. (31) Kiers, H. A. L.; Tenberge, J. M. F. Hierarchical relations between methods for simultaneous component analysis and a technique for rotation to a simple simultaneous structure. Br. J. Math. Stat. Psychol. 1994, 47, 109. (32) Timmerman, M. E.; Kiers, H. A. L. Four simultaneous component models for the analysis of multivariate time series from more than one subject to model intraindividual and interindividual differences. Psychometrika 2003, 68, 105. (33) Jansen, J. J.; Hoefsloot, H. C. J.; Greef, J. ASCA analysis of multivariate data obtained from an experimental design. J. Chemom. 2005, 19, 469. (34) Hanafi, M.; Kiers, H. A. L. Analysis of K sets of data, with differential emphasis on agreement between and within sets. Comput. Stat. Data Anal. 2006, 51, 1491. (35) Kiers, H. A. L. Analyzing Component Loading Matrices or Doing Simultaneous Component Analyses, IASC symposia. Compstat Statellite Meetings, Capri, Italy, September 46, 2006. (36) Van Deun, K.; Smilde, A. K.; van der Werf, M. J.; Kiers, H. A.; Van Mechelen, I. A structured overview of simultaneous component based data integration. BMC Bioinf. 2009, 10, 246. (37) de Tayrac, M.; Le, S.; Aubry, M.; Mosser, J.; Husson, F. Simultaneous analysis of distinct Omics data sets with integration of biological knowledge: Multiple Factor Analysis approach. BMC Genom. 2009, 10, 32.
ARTICLE
(38) Zhao, C. H.; Gao, F. R.; Niu, D. P.; Wang, F. L. A two-step basis vector extraction strategy for multiset variable correlation analysis. Chemom. Intell. Lab. Syst. 2011, 107, 147. (39) Zhao, C. H.; Yao, Y.; Gao, F. R.; Wang, F. L. Statistical Analysis and Online Monitoring for Multimode Processes with Between-mode Transitions. Chem. Eng. Sci. 2010, 65, 5961. (40) Zhao, C. H.; Mo, S. Y.; Gao, F. R.; Lu, N. Y.; Yao, Y. Enhanced Statistical Analysis and Online Monitoring for Handling Multiphase Batch Processes with Varying Durations. J. Process Control 2011, 21, 817. (41) Zhao, C. H.; Gao, F. R. Between-phase based Statistical Analysis and Modeling for Transition Monitoring in Multiphase Batch Process. AIChE J., DOI: 10.1002/aic.12783. (42) Macgregor, J. F.; Jaeckle, C.; Kiparissides, C.; Koutoudi, M. Process monitoring and diagnosis by multiblock PLS methods. AIChE J. 1994, 40, 826. (43) Kourti, T.; Nomikos, P.; Macgregor, J. F. Analysis, monitoring and fault-diagnosis of batch processes using multiblock and multiway PLS. J. Process Control 1995, 5, 277. (44) Westerhuis, J. A.; Kourti, T.; MacGregor, J. F. Analysis of multiblock and hierarchical PCA and PLS models. J. Chemom. 1998, 12, 301. (45) Qin, S. J.; Valle, S.; Piovoso, M. J. On unifying multiblock analysis with application to decentralized process monitoring. J. Chemom. 2001, 15, 715. (46) Zhao, C. H.; Gao, F. R. Multiblock-Based Qualitative and Quantitative Spectral Calibration Analysis. Ind. Eng. Chem. Res. 2010, 49, 8694. (47) Lindgren, F.; Geladi, P.; Wold, S. The kernel algorithm for PLS. J. Chemom. 1993, 7, 45. (48) Downs, J. J.; Vogel, E. F. A plant-wide industrial process control problem. Comput. Chem. Eng. 1993, 17, 245. (49) Lee, G.; Han, C. H.; Yoon, E. S. Multiple-Fault Diagnosis of the Tennessee Eastman Process Based on System Decomposition and Dynamic PLS. Ind. Eng. Chem. Res. 2004, 43, 8037. (50) Tian, Z. H.; Hoo, K. A. Multiple Model-Based Control of the TennesseeEastman Process. Ind. Eng. Chem. Res. 2005, 44, 3187. (51) Li, G.; Qin, S. J.; Zhou, D. H. Output Relevant Fault Reconstruction and Fault Subspace Extraction in Total Projection to Latent Structures Models. Ind. Eng. Chem. Res. 2010, 49, 9175.
1354
dx.doi.org/10.1021/ie201608f |Ind. Eng. Chem. Res. 2012, 51, 1337–1354