Ind. Eng. Chem. Res. 2009, 48, 6137–6144
6137
A Modified Network Component Analysis (NCA) Methodology for the Decomposition of X-ray Scattering Signatures Ian Tolle,† Xinqun Huang,† Yvonne A. Akpalu,‡ and Lealon L. Martin*,†,§ Department of Chemical and Biological Engineering, Department of Chemistry and Chemical Biology, and Department of Decision Sciences and Engineering Systems, Rensselaer Polytechnic Institute, Troy, New York 12180
In this work, we present an optimization-based methodology to identify network structure-process relationships in polymeric materials, to provide a robust understanding of how processing conditions affect their structural domains. The approach is largely inspired by network component analysis (NCA), which was originally developed in 2003 [Liao et al. Proc. Natl. Acad. Sci., U.S.A. 2003, 100, 15522-15527] to infer unobservable phenomena using large multivariate datasets. The fundamental assumption in NCA is that the connective relationship between desired unobservable phenomena and measured data is based on a largely known bipartite network topology. However, in many such analyses, this topology is either partially or completely unknown. To address this issue, the original NCA problem is reformulated as a mixed-integer nonlinear program (MINLP). Optimal solutions of the MINLP formulation provide network topologies and physically meaningful component signatures that correspond to the best possible data reconstruction. We demonstrate this approach for the analysis of wide-angle X-ray scattering (WAXS) data from a branched copolymer system. The copolymer samples vary in regard to side-chain length and isothermal crystallization temperature. Using our analysis, we isolate crystalline and amorphous components and observe their temperature-dependent variation. Good agreement is achieved between the degree of crystallinity calculated by this NCA-based decomposition and the experimentally reported values. Introduction The design of engineered materials with tailored properties requires fundamental knowledge of the interaction between processing conditions, the resulting structure/morphology, and the corresponding macroscopic properties of the material.1 These complex interactions can be described quantitatively by considering a material as a system (Figure 1a), consisting of multiple structural units that exist within the material. The network connections shown in Figure 1 describe which processing conditions control specific structural units, and what properties are dependent on those particular structures. An example2 of such a topological system is shown in Figure 1b for highperformance alloy steel, illustrating which processes govern the evolution of each microstructural unit (matrix, dispersions, grain boundaries) and which bulk properties are controlled by these structures. Construction of such a materials system chart for new engineered materials is dependent on detailed knowledge of the different structural units that exist within the material, and it is typically developed using combined experimental, semiempirical,3-6 and simulation methods.7-11 Scattering techniques (X-ray, light, neutron) are wellestablished tools that are used to study structural features present in materials at length scales that vary from angstroms (10-10 m) to microns (10-6 m). A large fraction of materials scatter isotropically, so the scattering intensity can be reduced to a single one-dimensional spectra I(q) as a function of the scattering vector, q)
where θ is the scattering angle and λ is the wavelength of the incident radiation. The intensity in reciprocal space is inversely related to the density distribution (structure) of the material in real space. As experimental methods used to obtain these scattering spectra have matured in recent years, it is now possible to acquire a large data set of scattering signatures from a single experiment rapidly, with the limiting step being the complete quantitative analysis of the data. Standard approaches for the analysis of scattering signatures fall into two classes, either by Fourier transforming the data and interpreting the resulting autocorrelation function, or by direct least-squares fit of the observed intensity to an intensity profile calculated from
4π sin θ λ
* To whom correspondence should be addressed. E-mail address:
[email protected]. † Department of Chemical and Biological Engineering. ‡ Department of Chemistry and Chemical Biology. § Department of Decision Sciences and Engineering Systems.
Figure 1. (a) Model and (b) example (from ref 2) for treating a material as a system of structural units and the network interactions between processing and properties.
10.1021/ie8012715 CCC: $40.75 2009 American Chemical Society Published on Web 03/03/2009
6138
Ind. Eng. Chem. Res., Vol. 48, No. 13, 2009
a particular model. In both classes, each I(q) is analyzed individually and the results are highly model-dependent. We instead propose an alternative optimization-based approach to analyzing an entire scattering dataset composed of many I(q) spectra over an identical q-range by building a linear multivariate representation in terms of a reduced set of component spectra and weights. The goals of using such a model-free analysis are to simultaneously capture the type of network process-structure-property topology illustrated in Figure 1 for a novel engineered material and characterize the fractions, sizes, and detailed structural parameters of different morphological units described by the reduced set of component scattering spectra. Extraction of information from multivariate high-dimensional chemical data sets using statistical techniques is generally the subject of chemometrics.12,13 Methods such as singular value decomposition (SVD), principal component analysis (PCA),14 partial least squares (PLS) regression, or self-modeling curve resolution15 (generalized as multivariate curve resolution, MCR),16,17 are used in approaches to building linear multivariate representations of such datasets and have been applied extensively in the analysis of chromatographic,18-20 ultraviolet/visible (UV-Vis),21 and infrared (IR) spectroscopy22 data, as well as online process monitoring.23-26 Each of these techniques have their advantages and disadvantages. The decompositions generated by PCA, SVD, and PLS are unique solutions, but the components that are generated are generally not physically meaningful. Alternatively, MCR can generate physically meaningful components through the addition of constraints, but the solutions that are generated are not unique. With this type of linear multivariate framework in mind, we have developed a novel generalized computational approach for the analysis of process-structure-property relationships in materials based on network component analysis (NCA),27,28 which is a technique that has been previously applied to identify unknown absorbance spectra from mixtures and transcription factor activity in gene expression data. It has the advantage of generating a unique, physically meaningful solution, but is limited in that at least partial knowledge of the connectivity between the component and the experimental signatures must be known a priori. To overcome this limitation, we reformulate the original NCA problem as a mixed-integer nonlinear program (MINLP) with a minimal number of constraints. We then apply our formulation to the decomposition of a scattering dataset from branched ethylene/olefin copolymers (EOCs) to systematically study the isothermal crystallization behavior due to variation in the length of side-chain branches (SCBs). Multivariate Data Analysis Techniques The fundamental assumption in choosing a multivariate technique is that the experimental signatures under analysis can be represented as the sum of a reduced set of weighted component signatures, formulated as a bilinear matrix decomposition (eq 1): E(M×N) ) A(M×L)P(L×N) + R(M×N)
(1)
Experimental data are organized into matrix form E(M×N), where each row represents an individual spectra from the ith of M samples, whose elements ei, j denote the response variable at the jth of N independent variables. For example, in chromatographic data, E would be composed of absorbance spectra over the range of N wavelengths (λ) from M samples. In the case of scattering data, E is composed of intensity spectra over a range of N scattering vectors (q) from M samples. The reduced set of
L component spectra is represented by the rows of matrix P over the same range of N independent variables, and the response profiles contained in the columns of A represent the relative weights of each component present in a row of E. The residual matrix R contains any responses that are not related to the pure components (e.g., noise). This type of matrix decomposition is inherently nonunique, unless constraints are applied. An infinite number of A and P values exist; yet, they may not have any physical meaning, with respect to the experimental system. Two types of ambiguities are possible for the experimental components: rotational (shape) ambiguity and magnitude (intensity) ambiguity, as illustrated in eq 2 through the introduction of a nonsingular transformation matrix X: jP j E ) AP ) A
(2)
where j ) XP j ) AX-1 and P (3) A If X contains nonzero off-diagonal elements, then the profiles in j and P j , yet A and P will be of a completely different shape than A still fit the experimental data equally well. If X is diagonal, no shape ambiguity exists, but the profiles in A and P are still subject to intensity ambiguity. No curve resolution method is capable of performing a model-free decomposition without intensity ambiguity, and so, typically, when a decomposition is said to be unique, it refers to an absence of shape ambiguity only. Principal component analysis (PCA)14 is a data analysis tool used to build such a matrix decomposition, generating unique component weights and signatures. The component weights and spectra generated by PCA are the orthogonal row and column basis vectors, calculated using SVD after mean centering the data, organized such that the first principal component (PC) explains the maximum statistically significant variation in the dataset and subsequent PCs explain successively less of the variation. However, these unique component spectra cannot typically be interpreted in the same manner as the experimental spectra. Extraction of the true, physically meaningful component spectra that contribute to the experimental data without any a priori knowledge of the system is called multivariate curve resolution (MCR).29 MCR techniques can be divided into iterative and noniterative techniques. Noniterative techniques, such as evolving factor analysis (EFA) and window factor analysis (WFA),30 recover the pure component profiles and weights one at a time, by performing local rank analysis of evolving windows in the original data matrix using PCA. Using the correct choice of location for the windows, they are capable of generating unique component spectra and weights but are limited, in that analysis is restricted to E matrices with sequentially evolving components. Iterative techniques are more versatile than noniterative resolution methods. Pure component weights and spectra are calculated through refinement of an iterative optimization process, subject to constraints on the physical or mathematical features of the components. The most widely employed technique is multivariate curve resolution-alternating least squares (MCR-ALS).16 It is not restricted to systems with sequentially evolving components, and multiple constraints can be applied to the component spectra generated, such as nonnegativity, and unimodality, or the profiles may be constrained according to hard models. Physically meaningful components can be generated, but their uniqueness is not guaranteed.31,32 To resolve a unique, physically meaningful solution to the multivariate linear decomposition posed in eq 1, using a minimal
Ind. Eng. Chem. Res., Vol. 48, No. 13, 2009
number of constraints, we have developed a novel computational approach based on network component analysis (NCA).27 NCA is an optimization-based mathematical decomposition that is generated by minimizing the Frobenius norm of the residual matrix R ) E - AP, which is defined as the nonlinear program P1. min | E - AP|
2
subject to A ∈ Z0
(4)
The elements of A are constrained according to a specific bipartite network topology, which is defined by the set Z0 ⊂ {(i, j)|i, j ∈ N}, where ai, j ≡ 0 for (i,j) ∈ Z0. An illustration of this bipartite topology is given in Figure 2. Z0 defines the connections between M samples and L topological components (TCs). If an element of A is nonzero, this indicates that the corresponding component signature in P exists in the experimental signature in E and a connection exists in Figure 2. A zero element of A indicates that this component is absent, with no connection in Figure 2. The following requirements on the nature of the connectivity pattern imposed by Z0, which are termed NCA network identifiability criteria, constrain problem P1 to be a convex optimization problem with a global minimum. Thus, a unique A and P exist as the solution to this optimization problem, subject only to intensity ambiguity, using a nonsingular diagonal scaling matrix X. NCA Network Identifiability Criteria (1) The number of components (L) is less than or equal to the number of samples (M). (2) Each column of A must contain at least L - 1 zeros. (3) A must have full column rank, and P must have full row rank. (4) For any reduced network Z0(i), created by removing the ith column of A and all the rows that contain a nonzero element in that column, (a) no component must become completely disconnected, (b) no “stars” (i.e., structures where multiple components are connected to a single sample) may be created, and (c) for any stars created where a single component is connected to D > 1 samples, the following inequality must hold: K-
∑φ
i,k e M - (L - 1)
∑ (D - 1) < L - 1
(5)
stari
where K is the number of samples remaining in the reduced network Z0(i). However, the key limitation of NCA is that the system’s topology must be known a priori. Knowledge of the positions of zeros in A must be available before the optimization problem can be solved. To resolve this issue, we propose the following extension to the NCA problem formulation.
∀k
(6)
∀ i, k
(7)
i)1
φi,ka(l ) e ai,k e φi,ka(u) p(l ) e pk,j e p(u) φi,k ∈ [0, 1]
(P1)
a,p
6139
M
∀ k, j
(8)
∀ i, k
(9)
There is also an associated experimental error for each scattering data point in E, which is denoted by a matrix σ(E), and error in the optimal A and P is taken into account using a weighted least-squares method, weighting each squared difference by the variance σ2(ei, j). NCA network identifiability criterion 1 is satisfied by fixing the desired number of components to be less than or equal to the number of samples (L e M), before performing the optimization. Criterion 2 is satisfied through the implementation of constraints specified by inequalities 6 and 7, which count the number of nonzeros in each column of Φ and constrain the elements of A to fall between some specified upper and lower bounds (a(l), a(u)) that are multiplied by the corresponding binary element of Φ. To satisfy criterion 3, the rank condition of the optimal A and P is checked after the optimization is finished. The requirements on reduced networks, as described by criterion 4, are also checked after the fact. The MINLP is solved combinatorially, by specifying all possible binary matrices Φ that satisfy constraint 6 and then solving each NLP subproblem using the IPOPT33 algorithm. The minimum value for the objective function out of all subproblems corresponds to the global minimum of the MINLP. The lower bounds a(l) and p(l) are set to zero if non-negativity constraints on the elements of A and P are chosen, and the upper bounds are set to a sufficiently large value such that a(u), p(u) . max (ei,j). Therefore, the domain of the elements ai,k and pk,j is effectively [0,∞). As stated earlier, intensity ambiguity of the components generated by a model-free decomposition is unavoidable, and the components must be scaled, depending on the desired analysis of the decomposition. Quantitative error analysis of the decomposition generated by NCA is performed using a bootstrap technique.34 A bootstrap sample E*1, E*2,..., E*B is drawn by randomly sampling the elements of the original E according to a normal distribution, ei,* j ≈ N(ei, j, σ2(ei, j)) for a specified number of bootstrap replications B. Selection of the parameter B is a tradeoff between statistical significance and computation time, but values of B g 100 are generally accepted. The MINLP NCA decomposition is calculated for each E*, from which the standard deviations and associated confidence intervals for the elements of A and
MINLP Extension of the NCA Problem Formulation When the topology of the system is completely unknown, problem P1 can still be solved, if reformulated as a MINLP defined by problem P2, where we have introduced a set of binary variables Φ(M×L) of the same dimension as A(M×L), whose zero and nonzero elements φi, k correspond to the presence or absence of each possible connection between a component and a sample. M
min a,p,φ
subject to
N
∑∑ i)1 j)1
(ei,j -
∑
L k)1
ai,k pk,j)2
σ2(ei,j)
(P2)
Figure 2. Assumed generalized bipartite network topology defined by Z0, illustrating the connectivity between L topological components (TCs) and the experimental signatures from M samples. Table 1. Properties of Branched Copolymers Chosen for Test System copolymer
Mw
EB EH
70 000 70 000
comonomer density, Tm (nominal) Mw/Mn content [mol %] F [g/cm3] [°C] 2.0 2.0
5.9 6.4
0.900 0.900
95 95
6140
Ind. Eng. Chem. Res., Vol. 48, No. 13, 2009
Figure 3. WAXS intensity from EB and EH copolymer samples isothermally crystallized at Tc ) 81-95 °C, organized into matrix form.
Figure 4. PCA decomposition of WAXS data: (a) score plot, showing PC1 weights versus PC2 weights, and (b) loading plot, showing PC signatures P(q).
P are calculated according to eq 10. It is possible that the optimal positions of zeros may change between bootstrap replications, and this fact is taken into account in the standard deviations calculated for the elements of Aopt: *2 *B *2 *B σˆ *i,j(A) ) σ(A*1 ˆ *i,j(P) ) σ(P*1 i,j , Ai,j , ... , Ai,j ), σ i,j , Pi,j , ... , Pi,j ) (10)
The selection of the correct number of components L is critically important when performing this NCA decomposition with a completely unknown topology. The decomposition must accurately reconstruct the original experimental signatures without overfitting the data using too many components, which is a common error in any multivariate approach.35 If L is too small, the optimal solution will be found quickly but the A and P terms that are generated will provide a poor fit of the experimental data matrix E. If L is too large, a more precise reconstruction will be found at the expense of computation time, but the data will be overfit if L exceeds the number of statistically significant PCs, as determined by cross-validation.36 Solution Strategy All steps required in the MINLP NCA methodology are detailed as follows: (1) Organize experimental data matrix E and error matrix σ(E). (2) Perform PCA on E to find the maximum number of statistically significant components (L). (3) Assign chosen values for L, B, and bounds a(l), a(u), p(l), and (u) p . (4) Perform optimization P2 for B bootstrap iterations to obtain A, P, and σˆ *(A), σˆ *(P). (5) Normalize A and P. Case Study By applying this modified NCA methodology to the analysis of a multivariate scattering dataset, the goal is to provide a robust
method for the characterization of the type of network process-structure-property connections illustrated in Figure 1, which are mathematically defined using the bipartite network topology in Figure 2. As a case study, the set of scattering data chosen for analysis consists of wide-angle X-ray scattering (WAXS) measurements taken during isothermal crystallization of branched polyethylene. Ethylene-1-butene (EB) and ethylene1-hexene (EH) samples with the properties listed in Table 1 were prepared as detailed in previous work,37 fixing experimental conditions as similar as possible, to maximize the effect that short-chain branch length has on the variation in the observed scattering intensity. The samples were placed in a dualtemperature chamber, where the samples initially undergo melting at 160 °C in one chamber and then are quickly (within ∼2 s) transferred by a pneumatic device to a second chamber that is held at the desired crystallization temperature at which the WAXS data were collected (Tc ) 81, 83, 86, 89, 92, and 95 °C).38 In WAXS measurements of semicrystalline polymers, it is well-established that two components are present in the observed signature, corresponding to the amorphous and crystalline polymer phases.39 The component due to scattering from the crystalline polymer (Icr) consists of a series of sharp Bragg diffraction peaks, with the positions of their peak maxima corresponding to the distance between crystal planes in the unit cell. In the case of polyethylene and ethylene copolymers, the two largest peaks are due to diffraction by the (110) and (200) crystal planes.40 Diffuse scattering from an amorphous polymer results in a component (Ia) with a characteristic broad Gaussian peak arising from the distribution of interatomic distances between randomly oriented polymer chains. Scattering from a semicrystalline polymer consists of a superposition of these two components, and each component’s integrated intensity directly corresponds to the relative amount of the crystalline and amorphous phases present in the sample. By isolating these two components, the degree of crystallinity w can be calculated from the ratio of their integrated intensities, according to Ruland’s
Ind. Eng. Chem. Res., Vol. 48, No. 13, 2009
6141
Figure 5. NCA decomposition of original WAXS data matrix: (a) EB weights with associated 95% confidence intervals, (b) EH weights with associated 95% confidence intervals, and (c) topological components (TCs), illustrating imperfect separation of crystalline and amorphous components.
Figure 6. NCA decomposition of modified E′ matrix: (a) EB weights with associated 95% confidence intervals, (b) EH weights with associated 95% confidence intervals, and (c) topological components, where Icr and Ia are now separated.
method41 (eq 11). The q2 term in the integrals is included due to isotropic scattering from the randomly oriented crystalline domains in the semicrystalline polymer.
∫ q I dq w) ∫ q I(q) dq ∞
0 ∞
2
cr
2
(11)
0
The experimental data is shown in Figure 3, with the intensity in arbitrary units (a.u.) measured at the 45th minute from the first appearance of initial crystals during cooling. This ensures that the scattering signatures analyzed arise from the most stable crystal structure formed at a given temperature/condition. The profiles are scaled such that the height of the amorphous halo in each sample is equal. Doing so emphasizes the variation in the two sharp crystalline peaks, relative to the broad amorphous halo, as the isothermal crystallization temperature varies. Data were collected from two types of copolymers of varying side-chain branch length, crystallized at six different isothermal crystallization temperatures for a total of 12 different samples. The applicable range of q measurements for WAXS are 0.7 Å < q < 2.9 Å (401 data points). Therefore, in this particular system, there are three different independent variables upon which the measured scattering intensity depends: q, Tc, and side-chain branch length. The data can thus be organized into a three-dimensional cube (tensor). However, performing a multivariate decomposition of three-dimensional data can be unnecessarily complex. The simplest means to avoid this problem is to construct a two-dimensional matrix that is composed of concatenated (stacked) matrices (this is a process called unfolding).13 In our system, the choice of concatenation direction is to stack the experimental profiles row-wise, while keeping the scattering vector dimension constant, yielding our experimental data matrix E(12×401), shown in Figure 3. Each row in E, when referenced, will be denoted by the copolymer sample and the
temperature (e.g. EB81, EH92, etc.). By choosing this concatenation direction, the resulting decomposition of E generates a set of component curves in P that fit the experimental data for both the EB and EH samples, and a row-wise concatenated A matrix of component weights. This concatenation direction was chosen specifically so that the resulting matrix decomposition represents the type of bipartite connectivity shown in Figures 1 and 2, between the input processing conditions (Tc, side-chain branch length) and resulting structures described by the component scattering signatures p(q). Results The number of topological components L used to fit the data is first chosen based on principal component analysis. As a result of experimental error, there will always be M PCs generated, of which typically only the first few are statistically significant. For the purposes of this study, L is selected based on the number of components required to explain over 99% of the variance in the data. For the chosen E Matrix, PCA yields two statistically significant principal components that explain 99.9% of the variance in the data; thus, we specify L ) 2. The score plot is approximately a straight line, which is indicative of a binary mixture, where the two extrema correspond to the purest samples analyzed. In our case, these extrema represent samples that exhibit the highest (EB81) and lowest (EH95) degree of crystallinity; and this is consistent with tabulated crystallinity data. However, the presence of negative intensity values precludes the PC loadings from being interpreted as physically meaningful scattering signatures. Next, we perform the modified MINLP NCA decomposition using L ) 2 topological components, and the results are shown in Figure 5. The standard deviations of the elements in A and
6142
Ind. Eng. Chem. Res., Vol. 48, No. 13, 2009
Table 2. Comparison between Crystallinity Calculated by NCA and Literature Values37 sample
NCA
experiment
EB81 EB83 EB86 EB89 EB92 EB95
N/A 0.209 ( 0.018 0.185 ( 0.019 0.157 ( 0.038 0.086 ( 0.031 N/A
0.195 0.185 0.169 0.148 0.102 0.053
EH81 EH83 EH86 EH89 EH92 EH95
0.203 ( 0.021 0.194 ( 0.022 0.166 ( 0.020 0.125 ( 0.013 0.058 ( 0.012 N/A
0.199 0.189 0.135 0.109 0.061 0.032
P are calculated using 100 bootstrap iterations. The key result is that the non-negativity constraints and network identifiability constraints on the elements of A and P generate unique components that can be analyzed as scattering signatures. Because magnitude ambiguity does exist, relative to the components and weights, they must be normalized using a diagonal nonsingular scaling matrix X for initial inspection. Since both component curves contain a peak at the position of the amorphous halo, the curves are scaled such that their relative heights are equal at this position and the intensity is set equal to 1. The optimal topology Φopt and corresponding weights in Aopt are given by eq 12, for this normalization. The NCA network identifiability constraints require that each column in Φopt must contain at least 1 (i.e., L - 1) zero, the optimal locations of which are found at the EB81 sample in column 1 (TC1 weights), and at the EB95 and EH95 samples in column 2 (TC2 weights). These are also the same samples at the extreme points of the PC score plot shown previously.
Φopt )
EB81 EB83 EB86 EB89 EB92 EB95 EH81 EH83 EH86 EH89 EH92 EH95
0 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 0 1 1 1 1 1 0
Aopt )
EB81 EB83 EB86 EB89 EB92 EB95 EH81 EH83 EH86 EH89 EH92 EH95
0 12.76 1.14 11.42 2.79 9.85 4.46 8.22 8.28 4.25 12.48 0 1.47 10.55 1.97 9.92 3.58 8.29 5.97 6.16 9.16 2.71 12.13 0
only the amorphous halo, whereas TC2 contains both amorphous and crystalline peaks. Because the amorphous halo is present in all samples, there should be a nonzero value for every element in the first column of A; however, for a valid NCA decomposition, there must be at least one zero element, because L ) 2, and the optimal position for this zero is located at the EB81 sample. Thus, for an NCA-identifiable decomposition, one row in E must correspond to a 100% degree of crystallinity sample, and one row must correspond to a 0% degree of crystallinity sample. The 95 °C samples are near their nominal melting temperature, and thus satisfy the requirement for presence of a near 0% crystallinity sample, but no 100% crystalline samples are present in this dataset. Because the NCA decomposition produced a zero at the EB81 sample position, we can generate a 100% crystalline EB81′ pseudo-sample in a new E′ matrix by fitting a Gaussian curve to the amorphous halo and then subtracting it. Performing PCA on the new E′ results in PCA loading vectors that are negligibly different from those observed in Figure 4, whereas there is an EB81′ outlier present in the PC score plot. By contrast, the NCA decomposition (Figure 6) results in separation of Icr and Ia, with the new topological components normalized by setting the maximum intensity equal to 1. The positions of zeros are the same as in eq 12, but now the weights for TC1 (Ia), remain constant with varying Tc, whereas those for TC2 (Icr) decrease as the crystallization temperature increases, reflecting the corresponding decrease in crystallinity. Analysis We confirm separation of the crystalline and amorphous components by calculating percent crystallinity (wi) of each sample i in E, using a modified version of eq 11, where Ii,cr(q) ) ai,2p2(q) and Ii(q) ) ai,1p1(q) + ai,2p2(q). By normalizing the weights and components, such that the integrated intensity under each component curve is set equal to 1, the calculation of crystallinity for each sample reduces to simply the ratio of the crystalline component weight to the sum of both the crystalline and amorphous component weights, as shown in eq 13: ai,2
wi ) ai,1
∫
qmax
0
∫
qmax
0
q2p2(q) dq
q p1(q) dq + ai,2 2
∫
qmax
0
) 2
q p2(q) dq
ai,2 ai,1 + ai,2 (13)
(12) However, there is not a clear separation of the crystalline and amorphous signatures. In Figure 5, the TC1 spectra contains
The values, and associated 95% confidence intervals, are compared to the tabulated37 experimental crystallinity data (see Table 2). Figure 7 shows excellent agreement with the tabulated results. Crystallinity values could not be calculated for EB81,
Figure 7. Comparison between Tc-dependent crystallinity calculated by NCA versus experimentally reported results for (a) EB and (b) EH, and (c) their correlation.
Ind. Eng. Chem. Res., Vol. 48, No. 13, 2009
6143
Professor James Liao for fruitful discussions regarding the implementation of NCA.
Figure 8. Process-structure topology obtained using WAXS data from a branched ethylene/olefin copolymer system.
EB95, and EH95, because these samples correspond to the positions of zeros in Aopt constrained by Φopt, and calculating their crystallinity would yield a value of 1 for EB81 and 0 for the 95 °C samples. The network process-structure relationships developed by this method are shown in Figure 8. A connection exists between the amorphous component (TC1) and all of the samples except for EB81′, because the amorphous halo was subtracted from that sample to generate an NCA identifiable network. Connections exist between the crystalline component (TC2) and all of the samples except for the two Tc ) 95 °C samples, because of the very low crystallinity of these samples. This methodology was capable of extracting this connectivity pattern using no a priori knowledge, although further refinement was required to calculate crystallinity values through the introduction of a pseudo-pure-crystalline sample to satisfy the NCA identifiability criteria. In this case, the system analyzed is admittedly fairly simple, but this methodology can be extended easily to more-complex systems. Future applications will include applying this technique using different types of polymers, including linear polyethylene,38,42 branched polyethylenes,37,43 and renewable polymers.44 These studies will lead to a more robust understanding of how processing conditions affect structural domains across multiple length scales, and develop micromechanical models for property prediction. Conclusions We have proposed a novel mixed-integer nonlinear program (MINLP)-based approach (inspired by network component analysis, NCA) for the analysis of polymer materials through the decomposition of multivariate scattering data. The key feature of our approach is that unique, physically meaningful solutions can be identified with little or no a priori information regarding systems characteristics. In our analysis of wide-angle X-ray scattering (WAXS) data from semicrystalline branched ethylene/olefin copolymer samples, we were able to successfully (i) isolate crystalline and amorphous structural components, (ii) identify the underlying network topology (optimal weights) related to the presence or absence of each component in the experimental signatures, and (iii) establish robust processstructure relationships that predict crystallinity values based on the optimal topological weights and are consistent with those reported in the literature. These promising results suggest that this technique is potentially applicable to larger datasets, using other combined scattering characterization methods (e.g., smallangle light and X-ray scattering) as well. Acknowledgment We would like to acknowledge the support of this work by the National Science Foundation (through Contract Grant Nos. CMMI-0600317 and DMR-0108976) and the Department of Energy Basic Energy Sciences (through Contract Grant No. FG02-06ER46349). The authors would also like to thank
Note Added after ASAP Publication: The version of this paper that was published on the Web March 3, 2009 contained several text errors in the Results section. These errors have been corrected; in addition, a new version of Figure 1b has been inserted. The corrected version of this paper was reposted to the Web April 9, 2009. Literature Cited (1) Olson, G. B. Designing a New Material World. Science 2000, 288, 993–998. (2) Olson, G. B. Computational Design of Hierarchically Structured Materials. Science 1997, 277, 1237–1242. (3) Van Krevelen, D. Properties of Polymers; Elsevier: Amsterdam, 1990. (4) Bicerano, J. Prediction of Polymer Properties; Marcel Dekker: New York, 2002. (5) Bubeck, R. A. Structure-property relationships in metallocene polyethylenes. Mater. Sci. Eng., R: Rep. 2002, 39, 1–28. (6) Brown, W. M.; Martin, S.; Rintoul, M. D.; Faulon, J. L. Designing novel polymers with targeted properties using the signature molecular descriptor. J. Chem. Inf. Model. 2006, 46, 826–835. (7) Reynolds, C. Designing Diverse and Focused Combinatorial Libraries of Synthetic Polymers. J. Combin. Chem. 1999, 1, 297–306. (8) Ismail, A. E.; Rutledge, G. C.; Stephanopoulos, G. Multiresolution analysis in statistical mechanics. I. Using wavelets to calculate thermodynamic properties. J. Chem. Phys. 2003, 118, 4414–4423. (9) Ismail, A. E.; Stephanopoulos, G.; Rutledge, G. C. Multiresolution analysis in statistical mechanics. II. The wavelet transform as a basis for Monte Carlo simulations on lattices. J. Chem. Phys. 2003, 118, 4424–4431. (10) Ismail, A. E.; Rutledge, G. C.; Stephanopoulos, G. Using wavelet transforms for multiresolution materials modeling. Comput. Chem. Eng. 2005, 29, 689–700. (11) In’t Veld, P. J.; Hutter, M.; Rutledge, G. C. Temperature-dependent thermal and elastic properties of the interlamellar phase of semicrystalline polyethylene by molecular simulation. Macromolecules 2006, 39, 439–447. (12) Otto, M. Chemometrics: Statistics and Computer Application in Analytical Chemistry; Wiley: New York, 2007. (13) Practical Guide to Chemometrics; Gemperline, P. J., Ed.; CRC Press: Boca Raton, FL, 2006. (14) Wold, S.; Esbensen, K.; Geladi, P. Principal Component Analysis. Chemom. Intell. Lab. Syst. 1987, 2, 37–52. (15) Lawton, W. H.; Sylvestre, E. Self Modeling Curve Resolution. Technometrics 1971, 13, 617–633. (16) Tauler, R.; Smilde, A.; Kowalski, B. Selectivity, Local Rank, 3-Way Data-Analysis and Ambiguity in Multivariate Curve Resolution. J. Chemom. 1995, 9, 31–58. (17) Tauler, R. Multivariate curve resolution applied to second order data. Chemom. Intell. Lab. Syst. 1995, 30, 133–146. (18) Henry, R. C.; Kim, B. M. Extension of self-modeling curve resolution to mixtures of more than three components: Part 1. Finding the basic feasible region. Chemom. Intell. Lab. Syst. 1990, 8, 205–216. (19) Kim, B. M.; Henry, R. C. Extension of self-modeling curve resolution to mixtures of more than three components: Part 2. Finding the complete solution. Chemom. Intell. Lab. Syst. 1999, 49, 67–77. (20) Kim, B. M.; Henry, R. C. Extension of self-modeling curve resolution to mixtures of more than three components: Part 3. Atmospheric aerosol data simulation studies. Chemom. Intell. Lab. Syst. 2000, 52, 145– 154. (21) Jaumot, J.; Escaja, N.; Gargallo, R.; Gonzalez, C.; Pedroso, E.; Tauler, R. Multivariate curve resolution: a powerful tool for the analysis of conformational transitions in nucleic acids. Nucleic Acids Res. 2002, 30, e92. (22) Janne, K.; Pettersen, J.; Lindberg, N. O.; Lundstedt, T. Hierarchical principal component analysis (PCA) and projection to latent structure (PLS) technique on spectroscopic data as a data pretreatment for calibration. J. Chemom. 2001, 15, 203–213. (23) Nomikos, P.; MacGregor, J. F. Multi-way partial least squares in monitoring batch processes. Chemom. Intell. Lab. Syst. 1995, 30, 97–108. (24) Zamprogna, E.; Barolo, M.; Seborg, D. E. Estimating product composition profiles in batch distillation via partial least squares regression. Control Eng. Pract. 2004, 12, 917–929.
6144
Ind. Eng. Chem. Res., Vol. 48, No. 13, 2009
(25) Bakshi, B.; Locher, G.; Stephanopoulos, G.; Stephanopoulos, G. Analysis of operating data for evaluation, diagnosis and control of batch operations. J. Process Control 1994, 4, 179–194. (26) Bakshi, B. R. Multiscale PCA with application to multivariate statistical process monitoring. AIChE J. 1998, 44, 1596–1610. (27) Liao, J. C.; Boscolo, R.; Yang, Y.-L.; Tran, L. M.; Sabatti, C.; Roychowdhury, V. P. Network Component Analysis: Reconstruction of Regulatory Signals in Biological Systems. P. Natl. Acad. Sci., U.S.A. 2003, 100, 15522–15527. (28) Boscolo, R.; Sabatti, C.; Liao, J. C.; Roychowdhury, V. P. A Generalized Framework For Network Component Analysis. IEEE/ACM Trans. Comput. Biol. Bioinf. 2005, 2, 289–301. (29) de Juan, A.; Tauler, R. Multivariate curve resolution (MCR) from 2000: Progress in concepts and applications. Crit. ReV. Anal. Chem. 2006, 36, 163–176. (30) Malinowski, E. R. Window Factor-AnalysissTheoretical Derivation and Application to Flow-Injection Analysis Data. J. Chemom. 1992, 6, 29– 40. (31) Manne, R. On The Resolution Problem In Hyphenated Chromatography. Chemom. Intell. Lab. Syst. 1995, 27, 89–94. (32) Gemperline, P. J. Computation of the range of feasible solutions in self-modeling curve resolution algorithms. Anal. Chem. 1999, 71, 5398– 5404. (33) Wachter, A.; Biegler, L. T. On the implementation of an interiorpoint filter line-search algorithm for large-scale nonlinear programming. Math. Prog. 2006, 106, 25–57. (34) Efron, B.; Tibshirani, R. J. An Introduction to the Bootstrap; Chapman and Hall/CRC Press: New York, 1993. (35) Defernez, M.; Kemsley, E. K. The use and misuse of chemometrics for treating classification problems. TrAC, Trends Anal. Chem. 1997, 16, 216–221.
(36) Wold, S. Cross-Validatory Estimation of Number of Components in Factor and Principal Components Models. Technometrics 1978, 20, 397– 405. (37) Li, Y.; Akpalu, Y. A. Probing the melting behavior of a homogeneous ethylene/1-hexene copolymer by small-angle light scattering. Macromolecules 2004, 37, 7265–7277. (38) Akpalu, Y. A.; Amis, E. J. Evolution of density fluctuations to lamellar crystals in linear polyethylene. J. Chem. Phys. 1999, 111, 8686– 8695. (39) Roe, R.-J. Methods of X-Ray and Neutron Scattering in Polymer Science; Oxford University Press: Cambridge, U.K., 2000. (40) Klug, H. P.; Alexander, L. E. X-ray Diffraction Procedures for Polycrystalline and Amorphous Materials; Wiley: New York, 1954. (41) Ruland, W. X-Ray Determination of Crystallinity and Diffuse Disorder Scattering. Acta Crystallogr. 1961, 14, 1180. (42) Akpalu, Y. A.; Amis, E. J. Effect of polydispersity on the evolution of density fluctuations to lamellar crystals in linear polyethylene. J. Chem. Phys. 2000, 113, 392–403. (43) Xiao, Z.; Akpalu, Y. A. New insights into the characteristics of early stage crystallization of a polyethylene. Polymer 2007, 48, 5388–5397. (44) Xie, Y. P.; Noda, I.; Akpalu, Y. A. Influence of cooling rate on the thermal behavior and solid-state morphologies of polyhydroxyalkanoates. J. Appl. Polym. Sci. 2008, 109, 2259–2268.
ReceiVed for reView August 19, 2008 ReVised manuscript receiVed December 8, 2008 Accepted December 12, 2008 IE8012715