Method for Automatically Identifying Spectra of Different Wood Cell

Dec 22, 2014 - On the basis of the acquired Raman imaging data set, it is possible to .... To determine the number of clusters without any a priori kn...
0 downloads 0 Views 2MB Size
Article pubs.acs.org/ac

Method for Automatically Identifying Spectra of Different Wood Cell Wall Layers in Raman Imaging Data Set Xun Zhang,† Zhe Ji,† Xia Zhou,† Jian-Feng Ma,‡ Ya-Hong Hu,§ and Feng Xu*,† †

Beijing Key Laboratory of Lignocellulosic Chemistry, Beijing Forestry University, Beijing, 100083, China International Centre for Bamboo and Rattan (ICBR), Beijing, 100102, China § Department of Mathematics, Lishui University, Lishui, 323000, China ‡

S Supporting Information *

ABSTRACT: The technique of Raman spectroscopic imaging is finding ever-increasing applications in the field of wood science for its ability to provide spatial and spectral information about the sample. On the basis of the acquired Raman imaging data set, it is possible to determine the distribution of chemical components in various wood cell wall layers. However, the Raman imaging data set often contains thousands of spectra measured at hundreds or even thousands of individual frequencies, which results in difficulties accurately and quickly extracting all of the spectra within a specific morphological region of wood cell walls. To address this issue, the authors propose a new method to automatically identify Raman spectra of different cell wall layers on the basis of principal component analysis (PCA) and cluster analysis. A Raman imaging data set collected from a 55.5 μm × 47.5 μm cross-section of poplar tension wood was analyzed. Several thousand spectra were successfully classified into five groups in accordance with different morphological regions, namely, cell corner (CC), compound middle lamella (CML), secondary wall (SW), gelatinous layer (G-layer), and cell lumen. Their corresponding average spectra were also calculated. In addition, the relationship between different characteristic peaks in the obtained Raman spectra was estimated and it was found that the peak at 1331 cm−1 is more related to lignin rather than cellulose. Not only can this novel method provide a convenient and accurate procedure for identifying the spectra of different cell wall layers in a Raman imaging data set, but it also can bring new insights into studying the morphology and topochemistry in wood cell walls.

T

It is known that wood cell walls are typically organized in several layers which have different compositions and structures that consist of secondary wall (SW, with the S1, S2, and S3 layers), compound middle lamella (CML, middle lamella plus adjacent primary wall), and cell corner (CC).3,4 These layers can be clearly distinguished in two-dimensional Raman images calculated by integrating the intensity of characteristic spectral

he technique of Raman spectroscopic imaging provides spatial and spectral information about a sample simultaneously and offers new capabilities for application to environmental analysis, polymer research, materials science, process control, and biomedical diagnostics.1 This technique is also finding ever-increasing applications in the field of wood science for its ability to characterize various chemical compositions in wood cell walls with minimal sample preparation and to address the compositional distributionrelated issues.2 © 2014 American Chemical Society

Received: November 6, 2014 Accepted: December 22, 2014 Published: December 22, 2014 1344

DOI: 10.1021/ac504144s Anal. Chem. 2015, 87, 1344−1350

Article

Analytical Chemistry bands.5 Results have revealed that the cellulose and hemicelluloses are mostly concentrated in the SW, while the lignin is concentrated along the CML and in the CC.6,7 In order to obtain more information about components present in a specific cell wall layer, it is necessary to effectively extract corresponding spectra in the Raman images. One accurate method for identification of chemical information from different cell wall layers is to analyze the recorded spectra one by one and pick out the wanted spectra on the basis of prior experience. Unfortunately, this is a complicated and timeconsuming process since several thousands of spectra are involved. Therefore, in practical operation, only a few typical spectra are selected for further analysis. For instance, if we need to obtain the average spectrum of SW, we may first pick out 10−20 spectra arising from SW and then calculate their average values. However, a great deal of data is wasted in this process and the obtained average spectrum may be not sufficient to represent the chemical information on the whole measured SW region. A crucial question arises immediately: are there any other methods that can be used to accurately and quickly recognize the spectra of a specific cell wall layer from a large Raman imaging data set? Here, we tried to utilize chemometrics, which is an application of statistical and mathematical methods to chemical data,8,9 to solve this question. Chemometrics has been used extensively in Raman spectroscopy as a powerful tool to provide a mechanistic interpretation of complicated physicochemical phenomena presented by a spectral data set.10,11 The key information included in Raman spectra concerning the relationships among analytes is usually distributed widely throughout the data set. By using multivariate methods of factor analysis, such as principal component analysis (PCA), it is possible to condense the essence of the information present in the spectral data into a very compact matrix representation.12 Then cluster analysis, which is a routine method for organizing the individual statements into meaningful groups,13 can be applied to classify the original spectra according to the compact matrix. Because the spectra recorded at the same cell wall layer are rather similar, we have a great opportunity to identify the spectra of different cell wall layers. In this study, a cross-section of poplar tension wood was prepared for Raman measurement. Anatomically, tension wood is characterized by fibers with a conspicuously thickened inner layer of the cell wall called the gelatinous layer (G-layer), which is unlignified or only marginally lignified.14 Raman imaging data set collected from a 55.5 μm × 47.5 μm area of the crosssectional surface was analyzed by using multivariate methods (including PCA and cluster analysis). On the basis of the results, we developed a novel method to automatically identify the spectra of various cell wall layers in Raman imaging data set. With the help of this method, the average spectra of various cell wall layers can be more accurately calculated. In addition, we explored the relationship between different characteristic peaks in the Raman spectra of poplar tension wood.

which are orthogonal. These PCs are also termed eigenvectors. The lack of correlation is a useful property as it means that the PCs are measuring different “dimensions” in the data set. Nevertheless, PCs are ordered so that PC1 exhibits the greatest amount of the variation; PC2 exhibits the second greatest amount of the variation; PC3 exhibits the third greatest amount of the variation and so on. That is var.(PC1) ≥ var.(PC2) ≥···≥ var.(PCp), where var.(PCp) expresses the variance of PCi in the data set being considered. The var.(PCi) is also called the eigenvalue of PCi. When using PCA, it is hoped that the eigenvalues of most of the PCs will be so low as to be virtually negligible. Where this is the case, the variation in the data set can be adequately described by means of a few PCs where the eigenvalues are not negligible. Accordingly, some degree of economy is accomplished as the variation in the original number of variables (x variables) can be described using a small number of the new variables (PCs), which account for the majority of the variability in the data. In this work, a series of Raman spectra can be regarded as a matrix X of m by n dimensions, in which the digitized Raman spectrum of each recorded position corresponds to a row vector in the data table: ⎡ x (1) x (2) ... x (n) ⎤ 1 1 ⎢ 1 ⎥ ⎢ x 2(1) x 2(2) ... x 2(n) ⎥ X=⎢ ⎥ ⋮ ⋮ ⋮ ⎥ ⎢ ⋮ ⎢ ⎥ ⎣ xm(1) xm(2) ... xm(n)⎦

(1)

where m is the number of spectra traces in this data set and n is the number of data points per spectrum along the wavenumber (or any other spectral variable) axis. Subsequently, this data matrix is subjected to PCA using standard statistical procedures as described by Dubey16 and Ç aydaş.17 Cluster Analysis. Cluster analysis is a statistical technique that sorts observations into similar sets or groups. The use of cluster analysis presents a complex challenge since it requires several methodological choices that determine the quality of a cluster solution.18 Typically, feature extraction should be applied prior to cluster analysis. The goal of feature extraction is to reduce the number of features (variables) to a sufficient minimum while retaining a maximal amount of information. Feature extraction offers several advantages than using original features directly, such as improved computational efficiency, reduced complexity for clustering model, etc. Various feature extraction methods can be used for imaging data analysis, such as PCA, wavelet transform,19 and angle measure technique (AMT),20 etc. In this case, PCA is used for feature extraction and PCA scores (i.e., PCs) were used as inputs for subsequent K-means cluster analysis, which is a nonsupervised technique to create clusters based on input data.21 The K-means algorithm attempts to group a set of N patterns into K disjoint clusters by minimizing, which is the sum of squared distances of the objects to their corresponding cluster means



THEORY PCA. PCA is one of the multivariate methods of analysis and has been used widely with large multidimensional data sets.15 Original PCA allows the number of variables in a multivariate data set to be reduced, while retaining as much as possible of the variation present in the data set. This reduction is achieved by taking p variables X1, X2, ..., Xp and finding combinations of these to produce principal components (PCs) PC1, PC2,..., PCp,

k=1 i=1

Jse =

∑∑ K

Nk

2

|| x(i k) − mk ||

(2)

x(k) i

where is the ith of the Nk patterns belonging to cluster k and mk is the mean of cluster k. To determine the number of clusters without any a priori knowledge, many empirical indices exist in cluster analysis.22 1345

DOI: 10.1021/ac504144s Anal. Chem. 2015, 87, 1344−1350

Article

Analytical Chemistry

Figure 1. Average Raman spectra acquired from (a) cell corner (CC), (b) compound middle lamella (CML), (c) secondary wall (SW), and (d) gelatinous layer (G-layer) of tension wood fiber.

University, China. The tension wood samples were extracted from the upper side of the trunk, and the presence of tension wood with a G-layer was confirmed by anatomical observations, showing a large amount of fibers with a G-layer and a thin S2 layer. Small sample blocks were cut out from the seventh annual ring of the stem. Without any embedding routine, 10μm-thick cross sections were cut on a sliding microtome (Leica 2010R). The sample was then placed on a glass slide with a drop of D2O and covered by a coverslip for subsequent Raman investigation (manual operation). Confocal Raman Microscopy. Raman spectra were acquired with a LabRam Xplora confocal Raman microscope (Horiba Jobin Yvon) equipped with a confocal microscope (Olympus BX51) and a motorized stage. In order to achieve high spatial resolution, measurements were conducted with a high numerical aperture (NA) microscope objective from Olympus (100× , oil, NA = 1.40), and a linear polarized laser (λ = 532 nm) excitation was focused with a diffraction-limited spot size (theoretical 1.22λ/NA). The laser power on the sample was approximately 8 mW. The Raman light was detected by an air-cooled front-illuminated spectroscopic charge-coupled device (CCD) behind a grating spectrometer (1200 grooves mm−1). For mapping, 0.5 μm steps were chosen and every pixel corresponds to one scan. The spectrum from each location was obtained by averaging 4 s cycles. It should be noted that each scan can record one Raman spectrum and its corresponding spatial position. Confocal aperture was set at 100 μm for all experiments. The reported depth resolution for the 400 μm confocal hole, based on the silicon (standard) phonon band at 520 cm−1, was 2 μm. The Labspect5 software (Horiba Jobin Yvon) was utilized to setup and control the microscope. For further processing, original data were converted into txt files. Data Processing. The mathematical software Matlab 2008a (MathWorks) was used for data processing (PCA, cluster analysis, etc.). To eliminate the baseline drift in the signals and accurately measure the peak height, the spectral data from

Typically, an index is evaluated by performing clustering using different numbers of clusters. Depending on the characteristic of the index, ideally the maximum or minimum is sought on the plot of the index against number of clusters. In practice, the number of clusters is found by searching for a significant change in the value of the index, which often appears as a significant “knee”. In this study, the normalized Hubert Γ statistic was calculated for K-means cluster analysis. The normalized modified Hubert Γ̂ statistic for hard clustering analysis has been discussed in detail.18 This statistic measures the degree of agreement between a proximity matrix P and a matrix Q. In this case, the proximity matrix P is an M × M symmetric matrix whose (i, j) element is the Euclidean distance between the observation vectors (in this case PCA scores) of pixels i and j and the matrix Q is an M × M whose (i, j) element is the Euclidean distance between the cluster centers where pixels i and j belong. The normalized modified Hubert Γ̂ statistic is defined as M−1

Γ̂ =

M

∑i = 1 ∑ j = i + 1 (pij − p ̅ )(qij − q ̅ ) M−1

M

∑i = 1 ∑ j = i + 1 pij 2 − p ̅ 2

M−1

M

∑i = 1 ∑ j = i + 1 qij 2 − q ̅ 2 (3)

where pij is the (i, j) element of the proximity matrix P, pij is the (i, j) element of the proximity matrix Q, p̅ and q̅ are the means of pij and qij involved in eq 3, respectively. Obviously, the calculation of Γ̂ is very similar to that of correlation coefficient. Essentially, Γ̂ measures how well distances between individual pixels and those between their respective cluster representatives (centers) agree with each other. The value of Γ̂ are between −1 and +1. A high value of Γ̂ indicates the existence of compact clusters.



EXPERIMENTAL SECTION Materials. An inclined 10-year-old poplar tree (Populus nigra L.) was provided by the arboretum of Beijing Forestry 1346

DOI: 10.1021/ac504144s Anal. Chem. 2015, 87, 1344−1350

Article

Analytical Chemistry Table 1. Raman Peak Positions and Bands Assignments for Major Structures of Poplar

a

wavenumbers (cm−1)

components

assignments

1095 1123 1163 1275 1331 1378 1460 1603 1656 2889 2940

Ca,Hb Ca,Hb Ca Lc Lc,Ca Ca Lc,Ca Lc Lc Ca,Hb Lc,Ca,Hb

heavy atom (CC and CO) str. heavy atom (CC and CO) str. heavy atom (CC and CO) str. plus HCC and HCO bending aryl-O of aryl OH and aryl O−CH3; guaiacyl ring (with CO group) mode HCC and HCO bending HCC, HCO, and HOC bending HCH and HOC bending aryl ring str., sym. ring conjugated CC str. of coniferyl alcohol; CO str. of coniferaldehyde CH and CH2 str. CH str. in OCH3 asym.

Cellulose. bHemicelluloses. cLignin.

Table 2. Main Results of the PCA Application for the 11 Characteristic Peaks at All the 9595 Different Raman Spectra loadings of principal components (PCs) wavenumbers −1

wavenumbers (cm )

PC1

PC2

PC3

PC4

PC5

PC6

PC7

PC8

PC9

PC10

PC11

1095 (Ca,Hb) 1123 (Ca,Hb) 1163 (Ca) 1275 (Lc) 1331 (Lc,Ca) 1378 (Ca) 1460 (Lc,Ca) 1603 (Lc) 1656 (Lc) 2889 (Ca,Hb) 2940 (Lc,Ca,Hb) variance (%) cumulative variance (%)

0.2798 0.2902 0.3169 0.3107 0.3080 0.3142 0.3275 0.2916 0.2576 0.2738 0.3366 77.98% 77.98%

0.2517 0.3825 0.1825 −0.2369 −0.3106 0.2731 −0.1264 −0.3816 −0.4563 0.3999 0.0146 15.89% 93.88%

−0.7892 −0.1599 0.3665 0.1158 −0.0854 0.0131 0.0375 0.0642 −0.1135 0.4171 0.0633 3.07% 96.95%

0.1630 0.0442 0.3644 0.6073 −0.0250 −0.2023 −0.2546 0.1836 −0.4673 −0.2637 −0.2047 1.45% 98.40%

0.0095 −0.1499 −0.1017 −0.0987 0.2217 −0.2026 0.6509 −0.0849 −0.5863 −0.1187 0.2801 0.65% 99.04%

−0.0861 0.1597 0.1598 0.1291 −0.3326 0.1743 0.5905 −0.2018 0.1932 −0.2378 −0.5483 0.28% 99.32%

0.0885 −0.0895 0.6711 −0.6094 0.2523 −0.0828 −0.0406 0.1361 0.0406 −0.2161 −0.1529 0.21% 99.53%

−0.1065 −0.1860 −0.2108 −0.0424 0.3297 0.7449 −0.0860 0.2115 −0.2981 −0.0908 −0.3082 0.17% 99.70%

0.3390 −0.7902 0.2015 0.1317 −0.2895 0.2096 0.0349 −0.1896 0.0887 0.0864 0.1497 0.14% 99.84%

0.2067 −0.0863 −0.1613 −0.1814 −0.3475 −0.1831 0.1649 0.6945 −0.1083 0.3749 −0.2735 0.09% 99.93%

0.1570 −0.1395 −0.0434 0.1236 0.5138 −0.2738 0.0287 −0.3195 0.0543 0.4980 −0.4959 0.07% 100.00%

a

Cellulose. bHemicelluloses. cLignin.

and hemicelluloses. Band assignments for poplar are shown in Table 1 on the basis of previous literature.5,24,25 It is noted that the typical bands of lignin are in the region between 1500 and 1700 cm−1 as a result of aromatic ring symmetric stretching vibration. There are two evident peaks at 1603 and 1656 cm−1. The spectral region from 2771 to 3000 cm−1 is assigned to C− H stretching in lignin and carbohydrates (including cellulose and hemicelluloses). Specifically, the peak at 2889 cm−1 is attributed to C−H stretching of cellulose. In addition, the bands at 1095 and 1123 cm−1 are attributed to the asymmetric and symmetric stretching vibration of C−O−C linkages of carbohydrates. The average Raman spectra recorded from various layers of tension wood fibers by manual method have almost the same peak positions but different intensities. In particular, the intensity of lignin peak at 1603 cm−1 is more pronounced in the spectra of CC (Figure 1a) and CML (Figure 1b) than those in SW (Figure 1c) and the G-layer (Figure 1d), while the intensity of carbohydrates peaks at 2893 cm−1 show an opposite trend. Such discrepancies are extracted by multivariate methods and further used as a characteristic for identifying various cell wall layers. PCA and Cluster Analysis. Raman spectra were measured over a 55.5 μm × 47.5 μm region with a spatial resolution 0.5 μm/pixel. Consequently, we have obtained 9595 spectra, which can be regarded as a matrix of 9595 by 977 dimensions. However, this matrix contains not only the information about the wood components but also the signals from the system

Raman spectroscopy were baseline corrected and smoothed using the adaptive iteratively reweighted penalized least-squares (airPLS) algorithm.23 This algorithm works by iteratively changing weights of the sum squares errors between the fitted baseline and original signals, and the weights of the SSE are obtained adaptively using the difference between the previously fitted baseline and the original signals (the algorithm is available as source software http://code.google.com/p/airpls). The PCA algorithm is performed by applying the self-built Matlab functions. The K-means algorithm was initiated by choosing the number of clusters in the Matlab environment. The initial data for the centers of the clusters were taken randomly from the data set. All data were then compared to these cluster centers, and new cluster centers were calculated by taking the mean of all data that were assigned to those clusters. This procedure was repeated until a stable solution was reached.



RESULTS AND DISCUSSION Raman Spectroscopic Interpretation. Raman spectroscopy has been used for probing structure, dynamics, and function of biomolecules. Several organic compounds and functional groups can be identified by their unique spectral pattern, and the intensity of the bands may be used for the calculation of the relative content in the sampled entity.7 Figure 1 shows the average Raman spectra extracted from various cell layers in the cross-section of poplar tension wood. Traditionally, the Raman bands are attributable primarily to the major wood polymers found in poplar, i.e., lignin, cellulose 1347

DOI: 10.1021/ac504144s Anal. Chem. 2015, 87, 1344−1350

Article

Analytical Chemistry

the cell wall area. It means that the Raman spectra from the poplar sample (the cell wall) are picked out from the original Raman imaging data set, which is useful for further extracting the nuggets hidden in the spectral data. However, in practical cases, the three clusters result is not sufficient to study the complicated layered structure of wood fibers, especially the cell wall of tension wood in this work since it consists of at least four layers as mentioned above (CC, CML, SW, and G-layer). Larger clusters are necessary for a more precise description of the hierarchical cell wall structure. Therefore, K-means algorithm was then performed for K = 5 (four cell wall layers plus cell lumen). The PCA scores image and cluster analysis result image for five clusters are shown in Figure 4. The five clusters result completely corresponds to the

noise. Accordingly, to eliminate the interferences arising from the system noise and reduce the calculation time, only the intensities of the characteristic peaks (i.e., the 11 peaks in Table 1) are retained for PCA. Table 2 shows the main results of the PCA application for the 11 characteristic peaks at all the 9595 Raman spectra. Considering a cumulative variance greater than 95%, the PC1, PC2, and PC3 components were selected to be taken into account for subsequent analysis, in which case 96.6% of the data variance was explained, indicating that the selected PCs could reflect the original data very well. Cluster analysis was then performed on these three PCs. K-means algorithm was applied by increasing the number of clusters K from 2 to 10 and the normalized modified Hubert Γ̂ statistic was calculated under each K condition. As illustrated in the plot of Γ̂ versus K in Figure 2, a significant jump appears at

Figure 2. Plot of normalized modified Hubert Γ̂ statistic vs number of clusters K.

K = 3 and it seems that three clusters result is a feasible solution for data set. Thus, the three clusters result was first used as the result for K-means algorithm. In order to better display the clustering information, a new image called “PCA scores image” is created as shown in Figure 3a. Moreover, Figure 3b presents the K-means cluster analysis result image in which pixels in different clusters. After referring to the bright field image in Figure 1, a different area can be roughly identified: cluster 1 corresponds to the cell lumen; clusters 2 and 3 correspond to

Figure 4. PCA scores plots (PC1 vs PC1 vs PC1) for five clusters (a) and corresponding cluster analysis result image for raw data space (b).

various layers in poplar tension wood fibers: clusters 1−5 correspond to the cell lumen, G-layer, CC, CML, and SW, respectively. This result indicates that it is possible to apply multivariate methods to automatically identify the spectra of different cell wall layers in Raman imaging data set. Automatic Identification of Spectra of Different Cell Wall Layers. On the basis of the above discussion, the method for automatic identifying the spectra of different cell wall layers in Raman imaging data set is proposed which includes five steps: (1) acquiring spectral data set and its corresponding spatial position by using confocal Raman microscopy; (2) baseline correction for spectral data set; (3) selecting specific Raman peaks and recording their intensities for PCA; (4) performing cluster analysis on PCA scores; and (5) verifying whether the clusters are corresponding to the hierarchical cell wall structure. We suggest that the method of baseline correction in step 2 is feasible. Besides of airPLS used in this study, other methods for baseline correction can be applied as well, such as manual point correction, derivatives, and subtraction methods.26,27 Similarly, the K-means algorithm is not the only choice for cluster analysis. Other hard cluster and fuzzy cluster analysis methods may also suitable for step 4.28 With the help of this method, the average spectra of various cell wall layers can be more accurately calculated. In this case,

Figure 3. PCA scores plots (PC1 PC1 vs PC1 vs PC1) for three clusters (a) and corresponding cluster analysis result image for raw data space (b). 1348

DOI: 10.1021/ac504144s Anal. Chem. 2015, 87, 1344−1350

Article

Analytical Chemistry

computation shows that these characteristic peaks are interrelated. The correlation coefficients between the peaks having similar assignments will always be found above 0.75 (e.g., the peaks at 1275, 1603, and 1656 cm−1 attributed to lignin), which proves that the conventional bands assignments are reliable to a certain extent. In particular, there is an interesting issue in the peak at 1331 cm−1. Some confusion exists about the assignment of this band since both lignin and cellulose can make contributions to it.2,5,24,25 However, the results in Table 3 shows that the correlation coefficient between the peaks at 1331 (lignin and cellulose) and 1603 (lignin) cm−1 is 0.96, which is much higher than that between 1331 and 1095 (cellulose and hemicelluloses) cm−1 (−0.14). This finding suggests that the peak at 1331 cm−1 is more related to lignin rather than cellulose. Accordingly, the band at 1331 cm−1 may be able to address lignin-related issues. However, further study is warranted to confirm this observation.

all of the acquired spectra are used for analysis so that the original data cannot be wasted. Figure 5 shows the average



CONCLUSIONS This study illustrates a novel method for automatically identifying the spectra of different wood cell wall layers in Raman imaging data set through a case study of a cross-section of poplar tension wood on the basis of PCA and cluster analysis. This automatic method falls into five steps: (1) collection of the Raman imaging data set, (2) baseline correction, (3) PCA for specific spectral peaks, (4) cluster analysis for PCA scores, and (5) results verification. Specifically, PCA was employed to reduce the number of variables in a huge data set in the condition of retaining as much as possible of the variation present in the Raman spectral data set. Cluster analysis is used to systematically assign the obtained spectra into different groups based on similarities in PCA scores. In fact, our method is achieved by spectral classification depending on the compositional differences existing between various cell wall layers. The results from our work show that cluster analysis is a useful tool to automatically segment a Raman imaging data set into distinct regions of similar chemical composition. On the basis of this method, we calculated the average spectra in various cell wall layers. These spectra were more accurate and representative than those obtained by manual method since they contain the whole spectral data of the corresponding layer. In addition, the relationship between different characteristic peaks within the Raman spectra of wood samples was discussed. The result of correlation coefficients indicated that the peak at 1331 cm−1 in the Raman spectrum of

Figure 5. Average spectra of different cell wall layers in poplar tension wood calculated by automatic identification method: (a) CC, (b) CML, (c) SW, and (d) G-layer.

spectra of different cell wall layers in poplar tension wood on the basis of this method. Compared to the average spectra obtained by the manual method (Figure 1), these spectra seem to have a lower signal-to-noise ratio. We believe that the average spectrum acquired by our method is more representative since it includes all the chemical information on the corresponding cell wall layer. In addition to calculation of the accurate average spectrum for a specific cell wall layer, our method also can be used to explore the relationship between different characteristic peaks within the spectra of wood samples. Here, we first employed this method to extract all of the spectra belonging to the wood sample (except for cell lumen) and then calculated the correlation coefficients for the 11 characteristic peaks in these spectra (Table 3). Theoretically, correlation coefficients ranged between −1 and 1, which measures the degree of correlation between two characteristic peaks. Because the observed spectral peaks may actually arise from a composite of several vibrational motions all having approximately the same frequency, the

Table 3. Correlation Coefficients between the 11 Characteristic Peaks in Extracted Spectra wavenumbers (cm−1) 1095 1123 1163 1275 1331 1378 1460 1603 1656 2889 2940 a

(Ca,Hb) (Ca,Hb) (Ca) (Lc) (Lc,Ca) (Ca) (Lc,Ca) (Lc) (Lc) (Ca,Hb) (Lc,Ca,Hb)

1095 (Ca,Hb)

1123 (Ca,Hb)

1163 (Ca)

1275 (Lc)

1331 (Lc,Ca)

1378 (Ca)

1460 (Lc,Ca)

1.0000

0.7551 1.0000

0.3154 0.6036 1.0000

0.2401 0.1033 0.6162 1.0000

0.2763 0.0301 0.4432 0.8921 1.0000

0.6348 0.8479 0.7217 0.3272 0.3111 1.0000

0.3377 0.2380 0.5739 0.8101 0.8986 0.5171 1.0000

1603 (Lc)

1656 (Lc)

0.1464 −0.0723 0.4495 0.9299 0.9646 0.2158 0.8422 1.0000

0.1192 −0.1546 0.2320 0.7792 0.9088 0.1603 0.7859 0.9149 1.0000

2889 (Ca,Hb)

2940 (Lc,Ca,Hb)

0.1939 0.7125 0.6468 −0.0504 −0.1464 0.7436 0.1304 −0.1931 −0.2789 1.0000

0.4385 0.4126 0.6818 0.7709 0.8392 0.6561 0.9101 0.7684 0.6879 0.3345 1.0000

Cellulose. bHemicelluloses. cLignin. 1349

DOI: 10.1021/ac504144s Anal. Chem. 2015, 87, 1344−1350

Article

Analytical Chemistry

(24) Wiley, J. H.; Atalla, R. H. Carbohydr. Res. 1987, 160, 113−129. (25) Agarwal, U. P.; Ralph, S. A. Appl. Spectrosc. 1997, 51, 1648− 1655. (26) Leger, M. L.; Ryder, A. G. Appl. Spectrosc. 2006, 60, 182−193. (27) Carron, K.; Cox, R. Anal. Chem. 2010, 82, 3419−3425. (28) Zhang, L.; Henson, M. J.; Sekulic, S. S. Anal. Chim. Acta 2005, 545, 262−278.

wood samples might be more related to lignin rather than cellulose. Certainly, the application of this novel method for automatic identifying the spectra of different wood cell wall layers in Raman imaging data set should not be limited to Raman spectral analysis. This method will bring new insights into studying the morphology and topochemistry in wood cell walls.



ASSOCIATED CONTENT

S Supporting Information *

Additional samples analyzed by this method. This material is available free of charge via the Internet at http://pubs.acs.org.



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Phone/fax: +86-10-62337993. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS The authors gratefully acknowledge the financial support from the National Science Foundation for Distinguished Young Scholars of China (Grant 31225005), the Chinese Ministry of Education (Grant 113014A), and the Foundation for the Advisor of Beijing Excellent Doctoral Dissertation (Grant 20131002201).



REFERENCES

(1) Batonneau, Y.; Laureyns, J.; Merlin, J. C.; Brémard, C. Anal. Chim. Acta 2001, 446, 23−27. (2) Gierlinger, N.; Schwanninger, M. Spectroscopy 2007, 21, 69−89. (3) Côté, W. A.; Day, A. C.; Timell, T. E. Wood Sci. Technol. 1969, 3, 257−271. (4) Saavedra Flores, E. I.; de Souza Neto, E. A.; Pearce, C. Comput. Mater. Sci. 2011, 50, 1202−1211. (5) Gierlinger, N.; Schwanninger, M. Plant Physiol. 2006, 140, 1246− 1254. (6) Ji, Z.; Ma, J. F.; Zhang, Z. H.; Xu, F.; Sun, R. C. Ind. Crops Prod. 2013, 47, 212−217. (7) Zhang, X.; Ma, J.; Zhe, J.; Yang, G. H.; Zhou, X.; Xu, F. Microsc. Res. Technol. 2014, 77, 609−618. (8) Malinowski, E. R. In Factor Analysis in Chemistry; Wiley: New York, 1991; pp 98. (9) Shinzawa, H.; Awa, K.; Kanematsu, W.; Ozaki, Y. J. Raman Spectrosc. 2009, 40, 1720. (10) Jiang, J. H.; Ozaki, Y.; Kleimann, M.; Siesler, H. W. Chemom. Intell. Lab. Syst. 2004, 70, 83−92. (11) Lin, W. Q.; Jiang, J. H.; Yang, H. F.; Ozaki, Y.; Shen, G. L.; Yu, R. Q. Anal. Chem. 2006, 78, 6003−6011. (12) Ryder, G. J. Forensic Sci. 2002, 47, 275−284. (13) Zhang, L.; Small, G. W. Appl. Spectrosc. 2003, 56, 1082−1093. (14) Yamamoto, H. J. Wood Sci. 2004, 50, 197−208. (15) Geladi, P.; Grahn, H. Multivariate Image Analysis; Wiley: New York, 1997. (16) Dubey, K.; Yadava, V. Opt. Laser. Eng. 2008, 46, 124−132. (17) Ç aydaş, U.; Hasçalık, A.; Ekici, S. Expert Syst. Appl. 2009, 36, 6135−6139. (18) Ketchen, D. J.; Shook, C. L. Strategic Manage. J. 1996, 17, 441− 458. (19) Walczak, B. Wavelets in Chemistry; Elsevier: New York, 2000. (20) Huang, J.; Esbensen, K. H. Chemom. Intell. Lab. Syst. 2000, 54, 1−19. (21) Gibney, M. J.; Walsh, M. C. Proc. Nutr. Soc. 2013, 72, 219−225. (22) Theodoridis, S.; Koutroumbas, K. Pattern Recognition; Academic Press: New York, 1999. (23) Zhang, Z. M.; Chen, S.; Liang, Y. Z. Analyst 2010, 135, 1138. 1350

DOI: 10.1021/ac504144s Anal. Chem. 2015, 87, 1344−1350