Anal. Chem. 2008, 80, 1474-1481
Biomarker Profiling and Reproducibility Study of MALDI-MS Measurements of Escherichia coli by Analysis of Variance-Principal Component Analysis Ping Chen, Yao Lu, and Peter B. Harrington*
Center for Intelligent Chemical Instrumentation, Department of Chemistry and Biochemistry, Clippinger Laboratories, Ohio University, Athens, Ohio 45701-2979
Matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS) has proved useful for the characterization of bacteria and the detection of biomarkers. Key challenges for MALDI-MS measurements of bacteria are overcoming the relatively large variability in peak intensities. A soft tool, combining analysis of variance and principal component analysis (ANOVA-PCA) (Harrington, P. D.; Vieira, N. E.; Chen, P.; Espinoza, J.; Nien, J. K.; Romero, R.; Yergey, A. L. Chemom. Intell. Lab. Syst. 2006, 82, 283-293. Harrington, P. D.; Vieira, N. E.; Espinoza, J.; Nien, J. K.; Romero, R.; Yergey, A. L. Anal. Chim. Acta. 2005, 544, 118-127) was applied to investigate the effects of the experimental factors associated with MALDI-MS studies of microorganisms. The variance of the measurements was partitioned with ANOVA and the variance of target factors combined with the residual error was subjected to PCA to provide an easy to understand statistical test. The statistical significance of these factors can be visualized with 95% Hotelling T2 confidence intervals. ANOVA-PCA is useful to facilitate the detection of biomarkers in that it can remove the variance corresponding to other experimental factors from the measurements that might be mistaken for a biomarker. Four strains of Escherichia coli at four different growth ages were used for the study of reproducibility of MALDI-MS measurements. ANOVA-PCA was used to disclose potential biomarker proteins associated with different growth stages. Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MADL-MS) has been widely used to investigate microorganisms.3-5 The capability of MALDI for profiling proteins of complex biological mixtures makes it a promising alternative technique for identification and characterization of * To whom correspondence should be addressed. Tel: 740-994-0265. Fax: 740-593-0148. E-mail:
[email protected] (1) Harrington, P. D.; Vieira, N. E.; Chen, P.; Espinoza, J.; Nien, J. K.; Romero, R.; Yergey, A. L. Chemom. Intell. Lab. Syst. 2006, 82, 283-293. (2) Harrington, P. D.; Vieira, N. E.; Espinoza, J.; Nien, J. K.; Romero, R.; Yergey, A. L. Anal. Chim. Acta. 2005, 544, 118-127. (3) Lay, J. O. Trac-Trends Anal. Chem. 2000, 19, 507-516. (4) Lay, J. O. Mass Spectrom. Rev. 2001, 20, 172-194. (5) Fenselau, C.; Demirev, P. A. Mass Spectrom. Rev. 2001, 20, 157-171.
1474 Analytical Chemistry, Vol. 80, No. 5, March 1, 2008
microorganisms.3-7 The feasibility of this method had been demonstrated by obtaining unique mass spectra of whole bacterial cells. Whole-cell measurement reduces sample preparation time dramatically. Proteins constitute more than 50% the mass of a dry bacteria cell,5 so MALDI-MS affords a sensitive measurement that has greater informing power than other biological methods because the protein profile is obtained across a wide mass range. Unfortunately, the MALDI-MS measurements of microorganisms suffer from low reproducibility with respect to the peak intensities, because there are many experimental and biological factors that influence the spectra.3-5 Experimental factors that strongly affect the spectra of bacteria include the type of matrix/ solvent system, sample preparation method, type of extraction solvent, sample concentration, and desalting technique.8-14 The effects of these factors have been systematically investigated by different research groups and reproducible results can be obtained if these factors are rigorously controlled.8,9,12,15,16 Methods used to study the effect of experimental factors primarily depend on visual comparison, which is subjective and is prone to misinterpretation of random fluctuations as causal factors. Another problem for MALDI-MS measurements of bacteria is that bacteria respond rapidly to environmental changes. One of these biological factors is the growth age of bacteria. The research (6) Krishnamurthy, T. Rapid Commun. Mass Spectrom. 1996, 10, 1992-1996. (7) Vaidyanathan, S.; Kell, D. B.; Goodacre, R. Abstracts of Papers of the American Chemical Society 2002, 224, U199-U199. (8) Domin, M. A.; Welham, K. J.; Ashton, D. S. Rapid Commun. Mass Spectrom. 1999, 13, 222-226. (9) Evason, D. J.; Claydon, M. A.; Gordon, D. B. Rapid Commun. Mass Spectrom. 2000, 14, 669-672. (10) Gantt, S. L.; Valentine, N. B.; Saenz, A. J.; Kingsley, M. T.; Wahl, K. L. J. Am. Soc. Mass Spectrom. 1999, 10, 1131-1137. (11) Holland, R. D.; Rafii, F.; Heinze, T. M.; Sutherland, J. B.; Voorhees, K. J.; Lay, J. O. Rapid Commun. Mass Spectrom. 2000, 14, 911-917. (12) Saenz, A. J.; Petersen, C. E.; Valentine, N. B.; Gantt, S. L.; Jarman, K. H.; Kingsley, M. T.; Wahl, K. L. Rapid Commun. Mass Spectrom. 1999, 13, 1580-1585. (13) Wunschel, S. C.; Jarman, K. H.; Petersen, C. E.; Valentine, N. B.; Wahl, K. L.; Schauki, D.; Jackman, J.; Nelson, C. P.; White, E. J. Am. Soc. Mass Spectrom. 2005, 16, 456-462. (14) Demirev, P. A.; Lin, J. S.; Pineda, F. J.; Fenselau, C. Anal. Chem. 2001, 73, 4566-4573. (15) Williams, T. L.; Andrzejewski, D.; Lay, J. O.; Musser, S. M. J. Am. Soc. Mass Spectrom. 2003, 14, 342-351. (16) Xu, M.; Voorhees, K. J.; Hadfield, T. L. Talanta 2003, 59, 577-589. 10.1021/ac7018798 CCC: $40.75
© 2008 American Chemical Society Published on Web 01/30/2008
conducted by Arnold et al.17 has demonstrated that mass spectra changed dramatically and consistently over time. The 3-month project conducted by Saenz et al.12 has shown that bacteria also have different patterns of variation of protein profiles over time. Previous studies17 used only one strain of bacteria and compared the mass spectra at different times to find the biomarker proteins corresponding to the age of the bacteria. However, the confounded variances in measurements might prevent the detection of biomarker proteins related to growth ages. It is critical to remove the variance contributed by other experimental factors before the detection of biomarkers. For complex studies, variances pertaining to the experimental hypothesis may be confounded with variations from other experimental sources. A useful statistical tool for separating sources of variation is analysis of variance (ANOVA). The significance of the factor can be obtained by comparing the variations between a target factor and the residual error. The residual error can be thought of as reference that is the inherent precision of the experiment. To decompose the total variance into different components corresponding to different factors, systematically designed experiments are required. ANOVA is a univariate method so it cannot be directly applied to underdetermined data such as MALDI-MS. ANOVA-principal component analysis (PCA) was devised to address this problem as a simple method for scientists. ANOVA-PCA is a method for exploring the relationships among multivariate data objects and separate correlated variances.1,2 It had been applied to detect biomarkers for premature delivery from MALDI-MS measurements of amniotic fluids.1,2 ANOVA-PCA combines the statistical advantages of ANOVA with the advantages of PCA for studying covariation among variables. The data set of response variables is decomposed into additive matrices that characterize a single factor of the experimental design and residual error following the general linear model. For a two-factor model with interaction, the original data matrix can be decomposed by
D ) GM + MR + Mβ + IMRβ + PE
(1)
for which, R and β represent two experimental factors which have l and k levels, respectively. All matrices have the same dimensionality but differ by maximal rank, which follows the degrees of freedom of ANOVA. The data matrix D has m rows and n columns with each spectrum comprising a row. All the rows in the grand mean matrix GM are composed of the average measurement obtained from all the spectra in the data set. The factor R effect matrix MR has l different average objects saved in rows and arranged by the experimental design. The factor β effect matrix Mβ has k different averages in the rows and arranged with respect to the experimental design. The interaction effect matrix IMRβ has l × k different average spectra in its rows. The matrix PE is obtained by subtracting all of the effect matrices from data matrix D and is the pure error term, which comprises the unmodeled variations A PCA score plot of a composite matrix of the effect and the pure error allows an easy comparison between the factor levels (17) Arnold, R. J.; Karty, J. A.; Ellington, A. D.; Reilly, J. P. Anal. Chem. 1999, 71, 1990-1996.
Table 1. Microorganisms Used in This Project, Rehydrating Broth, and Growing Mediaa microorganism (biosafety level)
ATCC no.
rehydrating broth
E. coli (1) E. coli O157:H7 (2) E. coli O55:K59 (B5):H6 (2) E. coli (1)
15223 43895 12014 14948
NB TSB NB NB
a
Culturing agar was nutrient agar for all strains.
and the residual error. Hotelling’s T2 statistics are used to construct 95% confidence ellipses around the means of each level so the significance of the factor levels can be visualized. If the factor is significant, its variation will be large compared to the pure error, the first principal component will characterize this variation, and the confidence intervals about the clusters will be well separated. The scores of spectra will cluster together to different levels of the factor and align along the first principal component when the levels are significantly different. The variable loadings define the orientation of the principal components with respect to the original coordinate system so they are directly amendable to interpretation as mass spectra. In this work, the reproducibility of MALDI-MS measurements of bacteria was investigated with ANOVA-PCA, with respect to the effects of growth age of cells, sample preparation, and shortterm instrument drift. ANOVA-PCA was used to determine the significant peaks that correlate with different experimental factors such as age and strain. It can be viewed as an inverse approach to classification that has prevailed in the literature. EXPERIMENTAL SECTION Materials. Freeze-dried stock cultures of bacteria were purchased from American Type Culture Collection (ATCC). Table 1 provides a detailed summary of the Escherichia coli strains that were studied and conditions used for bacterial culture and growth. (Caution: E. coli serotypes O157:H7 (ATCC 43895) and E. coli O55:K59 (B5):H6 (ATCC 12014) are Biosafety level 2 (BL2) and should be handled with extreme care following protocol and procedure.) All microbiological procedures were carried out in a certified BL2 cabinet using proper sterilization procedures. Nutrient broth was purchased from Difco (Sparks, MD). Nutrient agar, tryptic soy broth, sinapinic acid (SA), equine cytochrome c, equine myoglobin, and bovine trypsinogen were purchased from Sigma Aldrich (St. Louis, MO). Acetonitrile (ACN, HPLC grade) was purchased from Fisher Scientific (Fairlawn, NJ). Ionate trifluoroacetic acid (TFA, HPLC grade) was purchased from Pierce Biotechnology (Rockford, IL). Distilled water (resistivity, 18.2 MΩ) was purified with a Millipore Milli-Q plus system (Billerica, MA). Cell Growth. Cells were rehydrated by dissolving the bacteria pellet in 5 mL of tryptic soy or nutrient broths, which were previously autoclaved at 121 °C for 15 min. The bacteria were allowed to rehydrate and grow in the infusion for 24 h at 37 °C. Subsequently, the bacteria were streaked in Petri dishes containing sterilized nutrient agar. The agar plates were transferred to an oven maintained at 37 °C for incubation of the bacterial cultures. After 24, 48, 72, and 120 h, one plate of each strain was removed from incubation and stored in a refrigerator at 3 °C. After 120 h, Analytical Chemistry, Vol. 80, No. 5, March 1, 2008
1475
Table 2. Decomposition of Data Set D Obtained from the Experimenta factor 2 factor 1
24 h
48 h
72 h
120 h
mean of strains (SM)
E. coli 12014 E. coli 14198 E. coli 15223 E. coli 43895 mean of age (AM)
IA1 IA5 IA9 IA13 mean of 24 h
IA2 IA6 IA10 IA14 mean of 48 h
IA3 IA7 IA11 IA15 mean of 72 h
IA4 IA8 IA12 IA16 mean of 120 h
mean of 12014 mean of 14198 mean of 15223 mean of 43895 grand mean (GM)
a The data set was decomposed into the following: D ) GM + SM + AM + IM + DY+ PE. GM is the average of all spectra. SM consists of the average obtained from each strain. AM consists of the average obtained from each growth age. IM consists of averages (IA1-IA16) of the individual sample after the variance of strains and growth ages are removed. DY consists of the averages of three different days. PE is the residual. Note that all matrices have the same dimensionality.
all the bacterial suspensions were prepared on the same day and were analyzed within 3 days. E. coli was chosen in this project because some strains of E. coli are virulent and the other strains are innocuous. It would be useful to discriminate the pathogenic strains of E. coli.18 Another reason is that in previous studies mass spectral profiles of E. coli showed greater changes over growth time than other species.12 Sample Preparation. Suspensions of pure bacterial cultures were used. A few colonies were removed with a sterilized inoculating loop by filling approximately half of the loop with bacteria from the agar plate. The colonies were transferred to a microcentrifuge tube. Bacteria cells were suspended in a solution containing 1 mL of 40:60 ACN/TFA (0.1%, v/v). The bacterial suspensions were vortex mixed at intermediate speed for 2 min until a cloudy homogeneous solution was observed. All the bacterial suspensions were prepared on the same day. SA matrix was prepared each day of data acquisition at a concentration of 10 mg mL-1 by dissolving 10 mg of SA in 1 mL of a solution containing 40:60 ACN/TFA (0.1%, v/v). All solutions were homogenized on a vortex mixer for 2 min before analysis. A two-layer method was used for matrix/analyte sample preparation. In this two-layer method, a 1-µL aliquot of bacterial suspension was placed on the MALDI target plate, air-dried, and then codeposited with a 1-µL aliquot of SA matrix. The samples were air-dried before analysis. MALDI-MS Analysis. All mass spectra were acquired with a M@LDI LR by Micromass (Hertsfordshire, UK) time-of-flight mass spectrometer. The instrument was operated in positive polarity and linear mode. The following parameters were set in the mass spectrometer for MALDI spectra acquisition: accelerating voltage 15 kV, pulse voltage 1.6 kV, laser firing rate 5 Hz, 10 laser shots per summed spectrum, time-lag focusing 499 ns, matrix suppression 2 kDa, microchannel plate detector 1.8 kV, and sample period 0.5 ns. The MALDI-MS was calibrated with calibration standards that consisted of a protein mixture containing cytochrome c (MW 12 360), myoglobin (MW 16 951), and trypsinogen (MW 23 980). Seven aliquots for each suspension and a total of 112 sample/ matrix spots were deposited randomly onto two target plates on the same day. The mass spectra were acquired within three consecutive days. To evaluate the short-term instrument reproducibility, aliquots from one sample were analyzed in 3 days. During (18) Ochoa, M. L.; Harrington, P. B. Anal. Chem. 2005, 77, 5258-5267.
1476 Analytical Chemistry, Vol. 80, No. 5, March 1, 2008
the data acquisition, target plates were moved manually beneath the laser beam during spectrum acquisition to avoid sample depletion at any particular sample spot position. For each replicate, i.e., well in the 96-well plates, 60-100 mass spectral scans were acquired with typically 2 scans from the same location in the well. Once the signal was depleted, a new scan position was selected manually. Each mass spectral scan was the average spectrum of 10 single laser shots from the same location. Data Preprocessing. All calculations were performed with MATLAB R14 (The MathWorks, Inc. Natick, MA) using in-house scripts. The raw intensity spectra were converted into tab delimited ASCII text files using Databridge for use with Masslynx. The unprocessed spectra were read directly into MATLAB as ASCII files. During the data preprocessing, data compression with a wavelet transform was followed by a polynomial baseline correction and a mass/charge alignment that maximized the correlation with the average spectrum. To reduce computation time, spectra were compressed by wavelet transformation. The theory of wavelet transform and its application in chemistry can be found elsewhere.19 The mass spectra that were acquired comprised 120 079 data points. These spectra were compressed to sizes of 3753 data points by discarding the detail wavelet coefficients of the first 6 levels using the pyramid algorithm. These wavelet coefficients were generated using the function FWT-PBS in WaveLab,20 and the biorthogonal Villasenor wavelet was used for the compression. The baseline correction was accomplished by polynomial fitting. The compressed spectra were baseline corrected using an iterative 12th order polynomial fit to a scaled abscissa that ranged from [-1 1]. The iteration stopped when the number of the fitted points was less than 10% of the number of points in the spectrum or the median of the absolute value of the residuals converged. All the mass spectra were aligned to correct for minor shifts during the data acquisition by interpolating all mass spectral peaks to a quadratic model. The quadratic model was obtained by maximizing the correlation among single scans with the average spectrum of the data set. ANOVA-PCA. Before ANOVA-PCA, all the spectra were normalized to unit length. Normalization corrects the variations in sample concentrations and sensitivity among samples. The (19) Walczak, B. Ed. Wavelets in chemistry; Elsevier Science B.V.: Amsterdam 2000. (20) Donoho, D.; Duncan, M.; Huo, X.; Levi-Tsabari, O. http://www-stat.stanford.edu/∼wavelab.
Figure 1. MALDI-MS spectra of four strains of E. coli showing mass range 3000-18500 Da. Each spectrum is plotted with the relative intensity as a function of m/z. The labels above each peak represent the ions’ molecular weight (top) and absolute intensity (bottom), which are provided to facilitate interspectral comparisons. Each spectrum is an average of 60-80 summed scans and after baseline correction and smoothing.
pattern recognition algorithms used the relative peak intensities to identify or discriminate the bacteria. The decomposition of measurement matrix is given in Table 2. In this table, all subset matrices have the same dimensionality as the matrix D. The grand mean matrix GM is the average of all spectra. This matrix has a maximum rank of unity, and the rows of this matrix are the average spectrum. The next matrix SM describes the experimental hypothesis, the four strains of E. coli. This matrix SM comprises the four averages obtained from each strain. The matrix AM consists of the four averages obtained from each growth age. The interaction matrix IM comprises the 16 averages of the individual samples IA1-IA16. The variance of IM characterizes the interaction between strain and age. The matrix DY is the variation of the drift of instrument that is obtained by three averages of the measurements of each day after the variances from the other factors were removed. The residual pure error matrix PE characterizes all the variation that does not correspond with any factors in the experiment and represents the inherent precision of the experimental measurement. The matrix PE serves as a reference value for measuring the significance of the experimental variations. RESULTS AND DISCUSSION MALDI-MS Spectra of E. coli. Four spectra of each strain of E. coli used in this study are given in Figure 1. All of the spectra were obtained from the strains at a growth age of 24 h. The spectra were averages of 60-100 scans. The averaged spectra were baseline corrected with a 12th-order polynomial and SavitskyGolay smoothing (second-order polynomial, window size 7, twice). All the data processing was completed with Masslynx 4.0 provided with the instrument.
These spectra reveal that most of the peaks appear in more than one strain of E. coli. The peaks corresponding to proteins have been reported in SwissProt/TrEMBL21 database and other research reports.18,22-24 The four strains of E. coli gave different MALDI-MS profiles. For example, the most intense peaks for all these strains appear at different mass-to-charge ratios and relative peak intensities are different for different strains. It would be expected that mass fingerprints would be useful for discrimination and characterization of bacteria. Total Variance and Residual. For the data set D, a principal component score plot is given in Figure 2. The first two principal components account for 37% of total variance. The first principal component can separate the strain ATCC 14948 from the other three strains, which are partially overlapped. The E. coli 14948 has a smaller cluster than the other three strains and grew slower than the other three strains. The score plot of total variance can serve as a reference. For the ANOVA-PCA study, it is also important to analyze the residuals of measurements and make sure there is no significant variation from experimental factors in residuals. A principal component score plot of residuals is given in Figure 3. Note that, in this plot, ellipses of four E. coli strains have similar sizes. A big cluster in the center may indicate that there is no other significant variance from experimental factors. (21) The ExPASy (Expert Protein Analysis System) proteomics server of the Swiss Institute of Bioinformatics (SIB), http://us.expasy.org/srs/2006. (22) Dai, Y.; Li, L.; Roser, D. C. b.; Long, S. R. Rapid Commun. Mass Spectrom. 1999, 13, 73-79. (23) Arnold, R. J.; Reilly, J. P. Anal. Biochem. 1999, 269, 105-112. (24) Holland, R. D.; Duffy, C. R.; Rafii, F.; Sutherland, J. B.; Heinze, T. M.; Holder, C. L.; Voorhees, K. J.; Lay, J. O. Anal. Chem. 1999, 71, 3226-3230.
Analytical Chemistry, Vol. 80, No. 5, March 1, 2008
1477
Figure 2. Principal component score plot for the data set D of four strains of E. coli. The first two principal components account for 37% of the total variation. Each letter represents one single scan. The 95% confidence intervals are calculated around each strain.
Figure 3. Principal component score plot of residual showing that there is no difference among the pure error matrix PE. Each letter represents one single scan. The 95% confidence intervals are calculated around each strain.
Effect of Experimental Factors. The variance in the data set IM characterizes the interaction of strain and growth time. The score plot of the test matrix interaction (IM + PE) in which the 7 replicates of one bacterial suspension are treated as a group and may reveal the differences among the 16 samples. The score plot in Figure 4 shows that there are some differences among those samples. The E. coli 14948 at 24 h is different from all of the others. This agreed with the observation that during the first 1478
Analytical Chemistry, Vol. 80, No. 5, March 1, 2008
24 h E. coli 14948 grew slower than the other three strains. The reason for this difference is unknown. As stated above, all the spectra were collected in 3 days. Instrument drift during the 3-day experiment is not significant from the ANOVA-PCA study. The score plot was not provided. Difference in Strains. The score plots of the matrix (SM + PE) are given in Figure 5. All four strains have tighter clusters after the variances from the growth age and other experimental
Figure 4. Score plot for interaction test matrix (IM + PE). The first two principal components account for 17% of the total variation. The 16 labels correspond to 16 different samples. Each strain has 4 samples corresponding to different growth time. Each letter represents the principal component scores of a single scan. The 95% confidence interval is given around each cluster of scores.
Figure 5. Score plot of the matrix (SM + PE) showing variance of different strains after the effect of growth age is removed. The first two principal components account for 50% of the total variance of the matrix (SM + PE). The four labels represent the four strains of E. coli. Each letter represents one single scan. The 95% confidence intervals are calculated around each strain.
Figure 6. Principal component score plots of the matrix (AM + PE) showing the effect of growth age. These four labels represent the growth time 24, 48, 72, and 120 h, respectively. Each letter represents a single scan. The 95% confidence intervals are calculated from all the growth ages.
factors were removed. The first two principal components account for 50% of the total variance of the matrix. The cluster size of strain 14948 now is similar to the other three strains. The strain 12014 no longer overlapped with strains 15233 and 43895. The resolution between all combinations of pairs of clusters was measured using the projected difference resolution method.25 The projected resolution obtained for the closest cluster using just three components was 2.3, which was well above baseline separation of 1.5. This metric reveals that in a three-dimensional score plot
all clusters would be resolved with respect to their confidence intervals and indicates that a robust multivariate classifier should be able to identify the strains from their mass spectra. Effect of Growth Ages. The ANOVA-PCA score plot of growth time (AM + PE) is given in Figure 6. Four different growth age groups are aligned along the first principal component with increasing growth ages. The 24-h group is well separated from the other three groups. Three groups corresponding to 48, 72, and 120 h create a big cluster. It may show that the effect of growth age longer than 48 h decreases dramatically. There is an obvious trend that the scores of measurements increase from negative to positive with increasing growth ages. ANOVA-PCA
(25) Cao, L., OHIO, Athens 2004.
Analytical Chemistry, Vol. 80, No. 5, March 1, 2008
1479
Figure 7. (Top) The average mass spectrum for growth periods greater than 24 h. (Middle) The average of 20 bootstrap Latin partition loadings with 95% confidence intervals in red and green. Peaks outside the interval are significant. Positive peaks are characteristic of older bacteria and negative peaks of young bacteria. (Bottom) The average mass spectrum of the bacteria grown for 24 h.
not only disclosed the significance of growth age but also revealed the relationship among four different levels. Biomarker Candidate Detection. The average and 95% confidence interval of the variable loadings that were acquired from 20 bootstraps of the first principal component for age versus residual error are given in Figure 7 (middle). The bootstrap analysis was performed to validate that the peaks are significant and not a result of matrix effects or fluctuations of adduct ions. Peaks that are outside the 95% confidence interval are statistically significant. The average spectra acquired from bacteria grown for 24 and 48 h and longer are given in the bottom and top plots, respectivley. The scores in Figure 6 indicate that the age effect is significant especially for the youngest bacteria. Figure 7 gives the correlations for the peak intensities that have increased and decreased with bacterial age. The negative intensities occur for the lower mass proteins and positive intensities for higher mass proteins. Because the scores of the spectra are projections of the spectra onto the first principal component, the magnitude of peaks in the first principal component may reveal how the protein profiles vary with growth age. The fact that shorter growth age samples have negative scores and longer growth time samples have positive scores indicates that low-mass proteins have higher concentrations at shorter growth times and high-mass proteins have high concentrations at longer growth times. These trends in Figure 6 and Figure 7 illustrate the effects of the different growth stages of bacteria on the protein profile. Cells grow faster at early ages and assemble more ribosomes to produce smaller molecular weight proteins that result in more intense peaks at lower molecular weights. During the later growth stages, bacteria are limited by the nutrients and less low-mass ribosomal 1480
Analytical Chemistry, Vol. 80, No. 5, March 1, 2008
proteins are needed for protein synthesis. The decline of the concentrations of low-mass ribosomal proteins in the mixture may cause the increase of the intensities of high-mass peaks. The results agree with the observations of other research groups.12,17 MALDI-MS provides the molecular weight of the proteins, which is not enough information to identify a protein. Protein identifications were tentative because of the presence of adduct ions, post-translational modification, matrix effect, and mass errors.5,14 Key peaks in the variable loadings were searched using the Sequence Retrieval System module of the Expert Protein Analysis System and were validated by literature research.21 Some of these proteins (m/z 5096,6411, and 7272) found in this project were also reported as ribosomal proteins in other research studies.5,17,26 CONCLUSION MALDI-MS was used to investigate four strains of E. coli. All MALDI-MS spectra were compressed with wavelet transformation to reduce the computation time. With experimental design, ANOVA-PCA proved to be a useful tool for studying the effects of experimental and biological factors on the MALDI-MS measurements of whole-cell bacteria. AVOVA-PCA analysis indicated that growth age is an important factor in investigating bacteria with mass spectrometry. Spectra of the first 24 h had significant differences from the other spectra of longer growth times. The differences in protein abundance with age may be explained by the increased production of low-mass ribosomal proteins that are needed in the early stages of bacterial growth. The interaction (26) Jones, J. J.; Stump, M. J.; Fleming, R. C.; Lay, J. O.; Wilkins, C. L. Anal. Chem. 2003, 75, 1340-1347.
showed that strain 14948 showed different growth pattern than other strains. This project demonstrates the growth age of bacteria is a significant experimental factor and affects the reproducibility of MALDI-MS measurements so that it is an important factor to control. The short-term instrument drift had no significant effect on the reproducibility of the MALDI-MS spectra of E. coli.
The Research Corporation is thanked for the Research Opportunity Award. The NIH/NIMD is thanked for the summer faculty fellowship. Preshious Rearden, Mariela Ochoa, Xiaobo Xun, Zhangfeng Xu, and Weiying Lu are thanked for their helpful comments and suggestions.
ACKNOWLEDGMENT The Center for Chemical Instrumentation and Department of Chemistry and Biochemistry at Ohio University, and the U.S. army ECBC GeoCenter are acknowledged for their financial support.
Received for review December 7, 2007.
September
6,
2007.
Accepted
AC7018798
Analytical Chemistry, Vol. 80, No. 5, March 1, 2008
1481