Langmuir 2001, 17, 4649-4660
4649
Characterization of Adsorbed Protein Films by Time-of-Flight Secondary Ion Mass Spectrometry with Principal Component Analysis M. S. Wagner† and David G. Castner*,†,‡ National ESCA and Surface Analysis Center for Biomedical Problems, Departments of Chemical Engineering and Bioengineering, University of Washington, Box 351750, Seattle, Washington 98195 Received August 21, 2000. In Final Form: April 6, 2001 Characterization of the adsorbed protein film that forms upon implantation of a biomedical device is a long-standing interest in biomaterials research. Time-of-flight secondary ion mass spectrometry (ToFSIMS) is a powerful method for the characterization of adsorbed proteins on biomaterial surfaces due to its chemical specificity and surface sensitivity. However, the SIMS fragmentation patterns for proteins are quite complex due to the heterogeneity of the protein sequence. Therefore, the multivariate analysis technique principal components analysis (PCA) was used to obtain a more detailed interpretation of the protein SIMS spectra. This study utilizes single component adsorbed protein films on three model substrates and multivariate analysis of the ToF-SIMS data to determine the identity of protein films. Furthermore, ToF-SIMS and PCA were used to give insight into the composition of a 1% bovine plasma protein film. The single component spectra from 13 different proteins were readily distinguishable using PCA. The major component of the 1% bovine plasma film was found to shift from fibrinogen to γ-globulins over the course of 2 h, in agreement with the current literature. This study shows how combination of ToF-SIMS and PCA provides new insights into the composition of adsorbed protein films on biomaterial surfaces.
Introduction Upon implantation of a biomedical device, a layer of adsorbed protein is immediately deposited onto the surface of the biomaterial.1 The adsorbed protein layer can affect device performance and longevity by a variety of means such as bacterial infection,2-4 the formation of calciumcontaining deposits,5 device toxicity,6 tissue integration,7 and animal cell adhesion.8-10 For example, plasma proteins such as collagen,11 fibrinogen,12 fibronectin,13 and vitronectin14 have been related to bacterial adhesion and growth in long-term biomaterial implants. Characterization of the adsorbed protein film provides important information for the rational design of biomaterial surfaces * Address correspondence to this author at the Department of Chemical Engineering, University of Washington, Box 351750, Seattle, WA 98195-1750. Phone: (206) 543-8094. FAX: (206) 5433778. E-mail:
[email protected]. † Department of Chemical Engineering. ‡ Department of Bioengineering. (1) Baier, R. E.; Dutton, R. C. J. Biomed. Mater. Res. 1969, 3, 191. (2) An, Y. H.; Friedman, R. J. J. Biomed. Mater. Res. 1998, 43, 338. (3) Bryers, J. D. Colloids Surf. B 1994, 2, 9. (4) Rudnicka, W.; Sadowska, B.; Ljungh, A.; Rozalska, B. FEMS Immunol. Med. Microbiol. 1997, 19, 7. (5) Vasin, S. L.; Rosanova, I. B.; Sevastianov, V. I. J. Biomed. Mater. Res. 1998, 39, 491. (6) Ertel, S. I.; Ratner, B. D.; Kaul, A.; Schway, M. B.; Horbett, T. A. J. Biomed. Mater. Res. 1994, 28, 667. (7) Gristina, A. G. Science 1987, 237, 1588. (8) Horbett, T. A. Colloids Surf. B 1994, 2, 225. (9) Thomas, C. H.; McFarland, C. D.; Jenkins, M. L.; Rezania, A.; Steele, J. G.; Healy, K. E. J. Biomed. Mater. Res. 1997, 37, 81. (10) Kao, W. J.; Hubbel, J. A.; Anderson, J. M. J. Mater. Sci.: Mater. Med. 1999, 10, 601. (11) Montanaro, L.; Arciola, C. R.; Baldassarri, L.; Borsetti, E. Biomaterials 1999, 20, 1945. (12) McDevitt, D.; Manavaty, T.; House-Pompeo, K.; Bell, E.; Turner, N.; McIntire, L.; Foster, T.; Hook, M. Eur. J. Biochem. 1997, 247, 416. (13) Yu, J.-L.; Mansson, R.; Flock, J.-I.; Ljungh, A. FEMS Immunol. Med. Microbiol. 1997, 19, 247. (14) Lundberg, F.; Lea, T.; Ljungh, A. Infect. Immun. 1997, 65, 897.
that resist protein adsorption15-21 or specifically adsorb proteins of interest.22-26 Several surface analysis techniques, such as atomic force microscopy (AFM),27 ellipsometry,28 Fourier transform infrared attenuated total reflectance (FTIR/ATR),29 and electron spectroscopy for chemical analysis (ESCA),30 have been used to characterize the adsorbed protein layer at the biomaterial interface.31,32 However, characterization of the adsorbed protein film is not an easy task. ESCA has been used to detect protein on surfaces and determine surface coverage.33 However, since the atomic composition of proteins does not significantly differ from protein to protein, ESCA is not a useful tool for distinguishing (15) Lopez, G. P.; Ratner, B. D.; Tidwell, C. D.; Haycox, C. L.; Rapoza, R. J.; Horbett, T. A. J. Biomed. Mater. Res. 1992, 26, 415. (16) Sheu, M.-S.; Hoffman, A. S.; Terlingen, J. G. A.; Feijen, J. Clin. Mater. 1993, 13, 41. (17) Sheu, M.-S.; Hoffman, A. S.; Ratner, B. D.; Feijen, J.; Harris, J. M. J. Adhes. Sci. Technol. 1993, 7, 1065. (18) Morra, M.; Occhiello, E.; Garbassi, F. Clin. Mater. 1993, 14, 255. (19) Deng, L.; Mrksich, M.; Whitesides, G. M. J. Am. Chem. Soc. 1996, 118, 5136. (20) Favia, P.; d’Agostine, R. Surf. Coat. Technol. 1998, 98, 1102. (21) Harder, P.; Grunze, M.; Dahint, R.; Whitesides, G. M.; Laibinis, P. E. J. Phys. Chem. B 1998, 102, 426. (22) Boeckl, M.; Baas, T.; Fujita, A.; Hwang, K.-O.; Bramblett, A. L.; Ratner, B. D.; Rogers, J. W.; Sasaki, T. Biopolymers 1998, 47, 185. (23) Bohnert, J. L.; Fowler, B. C.; Horbett, T. A.; Hoffman, A. S. J. Biomater. Sci. Polym. Ed. 1990, 1, 279. (24) Kiaei, D.; Hoffman, A. S.; Horbett, T. A. J. Biomater. Sci. Polym. Ed. 1992, 4, 35. (25) Ratner, B. D.; Boland, T.; Johnston, E. E.; Tidwell, C. D. Thin Films and Surfaces for Bioactivity and Biomedical Applications; Materials Research Society: Boston, MA, 1995; p 195. (26) Shi, H.; Tsai, W.-B.; Garrison, M. D.; Ferrari, S.; Ratner, B. D. Nature 1999, 398, 593. (27) Siedlecki, C. A.; Marchant, R. E. Biomaterials 1998, 19, 441. (28) Elwing, H. Biomaterials 1998, 19, 397. (29) Chittur, K. K. Biomaterials 1998, 19, 357. (30) Ratner, B. D.; Castner, D. G. Colloids Surf. B 1994, 2, 333. (31) Ratner, B. D. Surf. Interface Anal. 1995, 23, 521. (32) Chittur, K. K. Biomaterials 1998, 19, 301. (33) Paynter, R. W.; Ratner, B. D.; Horbett, T. A.; Thomas, H. R. J. Colloid Interface Sci. 1984, 191, 233.
10.1021/la001209t CCC: $20.00 © 2001 American Chemical Society Published on Web 06/21/2001
4650
Langmuir, Vol. 17, No. 15, 2001
different proteins.34 Likewise, AFM lacks the chemical specificity to identify adsorbed proteins unless the AFM tip is functionalized with an antibody to the protein of interest.35 Secondary ion mass spectrometry (SIMS), however, provides the specific molecular information useful for distinguishing between different proteins on a biomaterial surface.36 SIMS is useful for the characterization of adsorbed protein films due to its chemical specificity and surface sensitivity. Due to its basis in mass spectrometry, SIMS yields information about the molecular structure of the surface. Furthermore, at an appropriately low primary ion dose in SIMS (called “static SIMS”), only the outermost 10-20 Å of the surface chemistry is sampled. With the introduction of the time-of-flight mass analyzer for SIMS (ToF-SIMS), the mass range of surface MS techniques has become theoretically limitless with mass resolutions (m/∆m) approaching 10 000.37 ToF-SIMS has been used in the characterization of biomaterial surfaces,38-40 biomolecules in tissue,41 and adsorbed protein films.42,43 SIMS has also been used to generate images of cells44 and pharmaceutical distributions in tissue.45,46 However, gaining useful information (such as the difference between two adsorbed protein films) is difficult due to the large number of peaks in the ToF-SIMS spectrum. In cases where there is an absence of unique peaks for different samples, the relative intensities of the peaks in the SIMS spectra are important for distinguishing the spectra.47 Multivariate analysis (MVA) techniques provide an excellent means for gaining useful information from large data sets. In particular, principal component analysis (PCA) can aid in the interpretation of ToF-SIMS spectra by capitalizing on the differences from spectrum to spectrum. Since all proteins consist of the same 20 amino acids, ToF-SIMS spectra of adsorbed protein films cannot be readily differentiated by the presence or absence of unique peaks. The relative intensities of the amino acid peaks in the protein ToF-SIMS spectra contain the relevant information for differentiating the protein spectra. PCA reduces the multidimensional aspect of ToFSIMS spectra into two or three dimensions so that the differences between the spectra can be clearly recognized. The use of PCA of ToF-SIMS data has been previously explored by Ohrlund et al.48 for solid phase extraction (34) Paynter, R. W.; Ratner, B. D. In Surface and Interfacial Aspects of Biomedical Polymers; Andrade, J. D., Ed.; Plenum Press: New York, 1985. (35) Chowdhury, P. B.; Luckham, P. F. Colloids Surf. A 1998, 143, 53. (36) Mantus, D. S.; Ratner, B. D.; Carlson, B. A.; Moulder, J. F. Anal. Chem. 1993, 65, 1431. (37) Benninghoven, A.; Hagenhoff, B.; Niehuis, E. Anal. Chem. 1993, 65, 630A. (38) Ratner, B. D.; Chilkoti, A.; Castner, D. G. Clin. Mater. 1992, 11, 25. (39) Ratner, B. D.; Tyler, B. J.; Chilkoti, A. Clin. Mater. 1993, 13, 71. (40) Leonard, D.; Mathieu, H. J. Fresenius J. Anal. Chem. 1999, 365, 3. (41) John, C. M.; Odom, R. W. Int. J. Mass Spectrom. Ion Processes 1997, 161, 47. (42) Ratner, B. D.; Tidwell, C. D.; Meyer, K.; Castner, D. G.; Golledge, S.; Hagenhoff, B.; Benninghoven, A. Fifth World Biomaterials Conference; Toronto, Canada, 1996; p 577. (43) Ratner, B. D.; Tidwell, C. D.; Castner, D. G.; Golledge, S.; Meyer, K.; Hagenhoff, B.; Benninghoven, A. Polym. Prepr. (Am. Chem. Soc., Div. Polym. Chem.) 1996, 37, 843. (44) Colliver, T. L.; Brummel, C. L.; Pacholski, M. L.; Swanek, F. D.; Ewing, A. G.; Winograd, N. Anal. Chem. 1997, 69, 2225. (45) Clerc, J.; Fourre, C.; Fragu, P. Cell Biol. Inter. 1997, 21, 619. (46) Fourre, C.; Clerc, J.; Fragu, P. J. Anal. At. Spectrom. 1997, 12, 1105. (47) Kargacin, M. E.; Kowalski, B. R. Anal. Chem. 1986, 58, 2300. (48) Ohrlund, A.; Hjertson, L.; Jacobsson, S. P. Surf. Interface Anal. 1997, 25, 105.
Wagner and Castner
stationary phases and Vanden Eynde and Betrand49,50 for polystyrene surfaces. This work extends the work of Lhoest et al.51 into the ToF-SIMS analysis of several adsorbed protein films by inclusion of more replicates for each protein to address spectral reproducibility, examination of the negative ion ToF-SIMS spectra, and the use of PCA as a tool for ToF-SIMS spectral interpretation and classification. This study utilized ToF-SIMS in conjunction with PCA to differentiate several different single component adsorbed protein films. The reproducibility of the ToF-SIMS spectra of the protein films was examined. The possibility of using PCA based methods for spectrum classification was explored. Furthermore, PCA of the single component ToF-SIMS spectra was used to provide insight into the composition of adsorbed plasma protein films. Experimental Methods Protein Adsorption. All single component protein adsorption experiments were performed for 2 h in citrate phosphate buffered saline with sodium azide and sodium iodide (CPBSzI; pH ) 7.4 except for collagen, where pH ≈ 1)52 at 37 °C using a total protein solution concentration of 100 µg/mL. The substrates used in the protein adsorption experiments were mica (SPI Supplies, West Chester, PA), poly(tetrafluoroethylene), PTFE, (Berghof/America, Concord, CA), and silicon wafer (Silicon Sense, Inc., Nashua, NH). All substrates were cut into approximately 1 cm2 squares. The mica was freshly cleaved immediately before protein adsorption. The PTFE and silicon samples were ultrasonically cleaned sequentially in methylene chloride, acetone, and methanol. The PTFE and silicon substrates were then dried and stored under nitrogen until use. After protein adsorption, the samples were rinsed twice in stirred CPBSzI buffer to remove loosely bound protein and three times in stirred deionized water to remove buffer salts. The protein samples were then dried and stored under nitrogen until ToF-SIMS analysis. Bovine serum albumin (A-7638), chicken serum albumin (A-3686), porcine serum albumin (A-1173), human serum albumin (A-3782), bovine collagen Type I (C-9791), bovine cytochrome c (C-2037), bovine plasma fibrinogen (F-8630), bovine plasma fibronectin (F-4759), bovine γ-globulins (G-5009), bovine hemoglobin (H-2500), bovine immunoglobulin G (I-5506), bovine lactoferrin (L-4765), chicken egg white lysozyme (L-6876), horse heart myoglobin (M-1882), papain (P-4762), and bovine transferrin (T-1408) were purchased from Sigma Chemical (St. Louis, MO) and were used as received. Protein adsorption onto freshly cleaved mica from 1% bovine plasma was performed for 5, 15, 30, 60, 90, and 120 min at 37 °C in CPBSzI buffer. Lyophilized bovine plasma (P-4639) was purchased from Sigma Chemical (St. Louis, MO), reconstituted with deionized water, and diluted to 1% using CPBSzI buffer. These samples were rinsed and stored as described above. ToF-SIMS Analysis. ToF-SIMS analysis of all adsorbed protein films was conducted on a PHI Model 7200 Reflectron time-of-flight secondary ion mass spectrometer (Physical Electronics, Eden Prairie, MN) using an 8 keV Cs+ primary ion source. Positive and negative ion ToF-SIMS spectra were acquired from 0 to 200 m/z over an area of 200 µm × 200 µm while maintaining a primary ion dose less than 1012 ions/cm2 to ensure static SIMS conditions.53 A pulsed low-energy electron flood gun was used for charge neutralization for all the substrates used in this study. The mass resolution (m/∆m) at the C4H8N+ (m/z ) 70) and C2H(m/z ) 25) peaks were typically above 4000 in the positive ion and negative ion spectra, respectively. Positive and negative ion ToF-SIMS spectra were calibrated to the CH3+, C2H3+, C3H5+, (49) Vanden-Eynde, X.; Bertrand, P. Surf. Interface Anal. 1997, 25, 878. (50) Vanden-Eynde, X.; Bertrand, P. Appl. Surf. Sci. 1999, 141, 1. (51) Lhoest, J.-B.; Wagner, M. S.; Tidwell C. D.; Castner D. G. J. Biomed. Mater. Res., in press. (52) Horbett, T. A. In Techniques of Biocompatibility Testing; Williams, D. F., Ed.; CRC Press: Boca Raton, FL, 1986. (53) Marletta, G.; Catalano, S. M.; Pignataro, S. Surf. Interface Anal. 1990, 16, 407.
Characterization of Adsorbed Protein Films Table 1. Positive Ion and Negative Ion Peaks Selected for Multivariate Analysis of Adsorbed Protein Filmsa source
fragment
(a) Positive Ion Peaks alanine (Ala, A) arginine (Arg, R) asparagine (Asn, N) aspartic acid (Asp, D) cysteine (Cys, C) glutamine (Gln, Q) glutamic Acid (Glu, E) glycine (Gly, G) histidine (His, H) isoleucine (Ile, I) leucine (Leu, L) lysine (Lys, K) methionine (Met, M) phenylalanine (Phe, F) proline (Pro, P) serine (Ser, S) threonine (Thr, T) tryptophan (Trp, W) tyrosine (Tyr, Y) valine (Val, V)
44: C2H6N+ 43: CH3N2+, 73: C2H7N3+, 100: C4H10N3+, 101: C4H11N3+, 112: C5H8N3+, 127: C5H11N4+ 70: C3H4NO+, 87: C3H7N2O+, 88: C3H6NO2+, 98: C4H4NO2+ 88: C3H6NO2+ 76: C2H6NS+ 84: C4H6NO+ 84: C4H6NO+, 102: C4H8NO2+ 30: CH4N+ 81: C4H5N2+, 82: C4H6N2+, 110: C5H8N3+ 86: C5H12N+ 86: C5H12N+ 84: C5H10N+ 61: C2H5S+ 120: C8H10N+, 131: C9H8O+ 68: C4H6N+, 70: C4H8N+ 60: C2H6NO+, 71: C3H3O2+ 69: C4H5O+, 74: C3H8NO+ 130: C9H8N+, 159: C10H11N+, 170: C11H8NO+ 107: C7H7O+, 136: C8H10NO+ 72: C4H10N+, 83: C5H7O+
(b) Negative Ion Peaks peptide backbone cysteine (Cys, C) cysteine (Cys, C) peptide backbone
26: 32: 33: 42:
CNSSHCNO-
a Peaks 69 and 131 were omitted from the protein on PTFE data due to overlap with fluorocarbon peaks (CF3+ and C3F5+, respectively).
and C7H7+ peaks and CH-, CN-, and CNO- peaks, respectively, before further analysis. Data Analysis. Principal component analysis of the ToFSIMS spectra was performed using the PLS_Toolbox v. 2.0 (Eigenvector Research, Manson, WA) for MATLAB (the MathWorks, Inc., Natick, MA). Further data treatment before principal component analysis is described below.
Data Treatment Data Preprocessing. Before multivariate analysis, several peaks were selected from each spectrum (Table 1). These peaks were chosen based on SIMS studies of amino acid homopolymers36,54 and plasma desorption mass spectrometry of free amino acids.55 Several of these peaks were previously reported by Lhoest et al.51 ToF-SIMS spectra where the intensity of the sodium ion peak was greater than 1% of the total intensity of the selected protein peaks were discarded due to the matrix effects of the sodium ion on the SIMS fragmentation process. The peaks were then normalized to their total intensity to correct for differences in total secondary ion yield from spectrum to spectrum, which can result largely from instrumental drift. A typical data set X with m samples and n peaks then can be written as a matrix with m rows and n columns. Before PCA, the data set was also mean-centered, which involves subtracting a row vector of column means x j from the original data set X to generate a mean-centered data set X h . Mean-centering centers the data set at the origin, so (54) Bartiaux, S. Undergraduate Thesis, Unite de Chimie des interfaces, Faculte des Sciences Agronomiques, Universite Catholique de Louvain, Louvain-la-Neuve, 1995. (55) Bouchonnet, S.; Denhez, J.-P.; Hoppilliard, Y.; Mauriac, C. Anal. Chem. 1992, 64, 743.
Langmuir, Vol. 17, No. 15, 2001 4651
that the variance in the data set is due to differences in sample variances instead of differences in sample means.56 Principal Component Analysis (PCA). A ToF-SIMS spectrum with n peaks can be visualized as a point in n-dimensional space. Since it is difficult to comprehend greater than three dimensions, it is useful to reduce the dimensionality of the multidimensional spectral space. Multivariate analysis methods such as principal components analysis (PCA) can reduce the dimensionality of multidimensional space while retaining a large amount of the original information in the data set. PCA operates by obtaining the eigenvectors and eigenvalues of the variance-covariance matrix (S) of the original data set. Calculation of the eigenvalues and eigenvectors can be accomplished readily via the singular value decomposition of S,57 which is the method employed in this paper. The mean-centered data matrix X h is therefore reduced to the sum of a cross-product of two smaller matrixes P and T and a residual matrix E:58
X h ) PTT + E
(1)
where P is the matrix of scores and T is the matrix of the eigenvectors (ti) called loadings. The cross-product of PTT contains most of the original variance in X with the remaining variance (mostly noise) relegated to the residual matrix E. The scores give the relationships between the samples in the new axis system, while the loadings give the relationship between the old variables (i.e., SIMS peaks) and the new variables (i.e., axes) called principal components. These principal components (PCs) are linear combinations of all of the original variables, and therefore capture more information than any one of the original variables. Furthermore, the PCs are orthogonal, meaning that, unlike the SIMS peaks, they are uncorrelated. Typically, a fewer number of PCs than original variables are required to capture the salient information in a particular data set, making the detection of patterns in the data set more straightforward. Excellent descriptions of PCA can be found in refs 57 and 58. Statistical Limits for the Scores Plots. Since the data sets in this paper are made up of several different groups of samples, statistical limits for each group would be useful. Each group of samples (i.e., the samples generated from one particular protein) is made up of replicates of the same protein, so the scores can be assumed to follow a normal distribution. The t distribution can then be used to calculate confidence limits for the scores of a particular group of proteins on a given PC.59 The confidence limits for a subgroup on a cross-plot of two PCs can be determined by performing PCA on the meancentered set of the subgroup scores and using the eigenvalues to determine the confidence limits for the scores on the secondary PCs. These confidence limits are the major and minor axes of a confidence ellipse that can be drawn around the scores. The loadings can then be used to rotate the ellipse and the scores from the secondary PCs back into the original PC plot. While the confidence limit ellipses show the bounds of the groups in two dimensions, a multivariate generaliza(56) Vandeginste, B. G. M.; Massart, D. L.; Buydens, L. M. C.; deJong, S.; Lewi, P. J.; Smeyers-Verbeke, J. Handbook of Chemometrics and Qualimetrics: Part B; Elsevier Science Publishers B. V.: Amsterdam, 1998. (57) Jackson, J. E. A User’s Guide to Principal Components; John Wiley & Sons: New York, 1991. (58) Wold, S.; Esbensen, K.; Geladi, P. Chemom. Intell. Lab. Syst. 1987, 2, 37. (59) Wise, B. M.; Gallagher, N. B. PLS_Toolbox Version 2.0 Manual; Eigenvector Research: Manson, WA, 1998.
4652
Langmuir, Vol. 17, No. 15, 2001
Wagner and Castner
Figure 1. (a) Positive and negative ion ToF-SIMS spectra of bovine hemoglobin adsorbed onto mica from a 100 µg/mL single component protein solution. The major secondary ions from the fragmentation of the protein are labeled. (b) Positive and negative ion ToF-SIMS spectra of horse heart myoglobin adsorbed onto mica from a 100 µg/mL single component protein solution. The amino acids that correspond to the peaks labeled in (a) are given.
tion of the t distribution is required to test for outliers in the data set since the outliers can occur on any of the retained PCs. Statistical analysis of the resultant scores for a particular group of samples (e.g., spectra from one protein) may improve the clustering of the samples in the PC scores plots by removing outlier samples. One such statistic is Hotelling’s T2 statistic as given by Jackson.57 The value of the T2 statistic is therefore dependent on the number of PCs retained in the PC model. This quantity was originally put forward by Hotelling as the multivariate generalization of the Student’s T distribution.60 This
statistic can be viewed as the distance in PC space to the multivariate mean (for a mean-centered data set, this is the origin). Confidence limits based on the F distribution can be calculated.57 The T2 statistic can therefore be used to detect “outliers” from the multivariate mean. Cross-Validation of PC Models. One of the advantages of PCA is that the analyst can easily determine how well “unknown” data fits within a particular data set by obtaining a PC model and projecting the new data into (60) Hotelling, H. An. Math. Stat. 1931, 2, 360.
Characterization of Adsorbed Protein Films
Langmuir, Vol. 17, No. 15, 2001 4653
Figure 2. Scores plot from PCA of the negative ion spectra of proteins adsorbed onto mica from 100 µg/mL single component protein solutions.
that model. The difficulty with creating a PC model is the determination of how many PCs should be retained to develop the model. Since PCA decomposes the data set into a few information-rich dimensions (i.e., the PCs), enough of these PCs need to be retained to capture the relevant information about the data. However, since PCA also functions to remove noise from the data set, the analyst must avoid retaining too many PCs to avoid the incorporation of noise into the model. One method for determining the number of PCs to be retained in a PC model is cross-validation. Cross-validation for PCA was developed by Wold61 and later improved upon by Eastment and Krzanowski.62 This method involves leaving one sample out of the data set, calculating a PC model, retaining a certain number of principal components, and quantitatively determining how well the left-out sample fits the model (i.e., how well the score of the sample is predicted). The most straightforward description of cross-validation error (PREdiction Sum of Squares, PRESS) is given by Jackson.57 Eastment and Krzanowski proposed a stopping rule based on the PRESS called the W statistic.62 The W statistic represents the increase in predictive information supplied by the Mth PC, divided by the average information in each of the remaining PCs.62 Typically, a PC is retained if its W statistic is greater than 1. While this method for choosing the number of PCs to retain in a PC model does not employ any statistical tests, it is more quantitative than many other methods employed to determine the number of PCs retained in a PC model. Results ToF-SIMS Spectra of Adsorbed Protein Films. Figure 1a shows typical positive and negative ion spectra of bovine hemoglobin adsorbed onto mica from a 100 µg/ mL hemoglobin solution. In the positive ion spectrum, it is easy to identify the peaks listed in Table 1a. Several of (61) Wold, S. Technometrics 1978, 20, 397. (62) Eastment, H. T.; Krzanowski, W. J. Technometrics 1982, 24, 73.
these peaks are labeled in Figure 1a. In the negative spectrum, only peaks at m/z 26 (CN-) and m/z 42 (CNO-) are labeled due to their significant intensity. These peaks appear in all of the spectra of amino acid homopolymers and can be assigned to the poly(amide) protein backbone. Typical positive and negative ion spectra of horse heart myoglobin adsorbed to mica from a 100 µg/mL myoglobin solution are shown in Figure 1b. The amino acids corresponding to the peaks labeled in Figure 1a are shown in Figure 1b. The complexity of these spectra make it difficult to distinguish between the positive and negative ion spectra of bovine hemoglobin and the corresponding spectra of horse heart myoglobin. This process becomes even more complicated when several different protein spectra are compared. PCA of Negative Ion ToF-SIMS Spectra. PCA of the negative ion ToF-SIMS spectra provides little insight into the differences between adsorbed protein films. A scores plot of the negative ion ToF-SIMS spectra for several proteins adsorbed onto mica from 100 µg/mL single component protein solutions is shown in Figure 2. The first two PCs capture 99% of the variance in the data set, which is not surprising since only four peaks are used from the negative ion spectra. While some clustering of protein spectra is evident, the groups are not well-defined. Separation of the proteins on PC 1 is not very clear. The CN- and CNO- peaks load highly into the first PC, with the CN- peak loading positively and the CNO- peak loading negatively. The S- and SH- peaks load positively into the second PC while the CN- and CNO- peaks load negatively. The second PC separates the proteins by the amount of cysteine present in that protein (data not shown). While this provides some insight into the composition of the adsorbed protein film, there is not enough information in the negative ion spectra to completely differentiate the proteins by type. PCA of Positive Ion ToF-SIMS Spectra. Protein Films on Mica. PCA of the positive ion ToF-SIMS spectra for several proteins adsorbed onto mica from single
4654
Langmuir, Vol. 17, No. 15, 2001
Wagner and Castner
Figure 3. (a) Scores plot from PCA of the positive ion spectra of proteins adsorbed onto mica from 100 µg/mL single component protein solutions. Note the excellent separation of several of the protein spectra shown in this figure. Ellipses drawn around each of the groups represent the 95% confidence limit for that group on PCs 1 and 2. (b) Loadings plot for the first two PCs from PCA of positive ion spectra of proteins adsorbed onto mica from 100 µg/mL single component protein solutions. The loadings are ordered in increasing mass. These loadings show how the original SIMS peaks are related to the location of the spectra on the scores plot.
component 100 µg/mL protein solutions can readily distinguish between protein spectra. The scores plot for the first two PCs is shown in Figure 3a. The first two PCs capture 72% of the total variance in the data set. This plot shows several distinct groups pertaining to each of the proteins in the data set. The corresponding loadings plots for PCs 1 and 2 are shown in Figure 3b. The loadings plot shows that the C4H8N+, C5H10N+, C5H8N3+, and C8H10N+ peaks contribute highly to the separation of the spectra on PC 1. Likewise, the CH4N+, C4H10N+, and C5H10N+ peaks contribute highly to the separation on PC 2. The
loadings reveal that the SIMS spectra reflect the bulk amino acid composition of the adsorbed proteins. Figure 4a shows that the scores on PC 1 are positively correlated to the relative amounts of lysine, histidine, and phenylalanine in the proteins and negatively correlated to the relative amount of proline in the proteins. This relationship corresponds to the positive loadings of the lysine, histidine, and phenylalanine peaks and the negative loading of the proline peak. Figure 4b shows that the scores on PC 2 do not correspond as well to any particular amino acid. This is primarily due to the lower amount of variation
Characterization of Adsorbed Protein Films
Figure 4. Scores of the spectra on the PCs, reflecting their bulk amino acid composition for peaks with large loadings. (a) The scores on PC 1 reflect the bulk amino acid composition of several proteins. (b) The scores on PC 2 do not reflect the bulk amino acid composition as closely as PC 1 due to the lower amount of variation captured by PC 2.
captured by PC 2. Furthermore, since the scores on the PCs are a multivariate combination of several peaks, the scores do not exactly track the abundance of individual amino acids in a protein. PCA can even distinguish between ToF-SIMS spectra of one protein from different species. Figure 5a shows the scores plot for the first two PCs of albumin from four different species adsorbed to mica from 100 µg/mL single component solutions. The chicken and porcine albumin spectra are significantly different from the bovine and human albumin. The bulk amino acid compositions for these proteins are very similar for most of the amino acids (Table 2). The loadings plot (Figure 5b) shows that the peaks at m/z 84 (from lysine) and m/z 86 (from leucine and isoleucine) contribute significantly to the differences in the spectra. It is these amino acids that differ the most significantly in the bulk compositions of the different albumins shown in Table 2. Furthermore, the isoleucineto-leucine ratio changes significantly in the bulk composition of the different albumins. This changing ratio may affect the secondary ion yield of the peak at m/z 86 and hence affect the separation of the different albumin spectra on the scores plot. The remaining amino acids have very similar proportions in each of the albumins studied. Protein Films on PTFE. PCA of ToF-SIMS spectra for several proteins adsorbed onto PTFE from single component 100 µg/mL protein solutions show that six of the protein clusters are well separated, but several protein clusters overlap in the cross-plot of PCs 1 and 2 (Figure 6a). When PC 3 is plotted against PC 1, however, the separation of the groups improves so that seven groups
Langmuir, Vol. 17, No. 15, 2001 4655
are now well separated (Figure 6b). By examining the scores of the protein spectra on PC 2, the spread of the data within each protein cluster is apparent. In the third PC, the spread of the spectra within each group is significantly reduced. As with the protein adsorbed to mica, the loadings (shown in Figure 6c) reflect the amino acids that change most significantly in the bulk composition of the proteins. By performing PCA on the spectra that overlap in the PC 1 vs 2 cross-plot, the different groups readily become apparent (Figure 6d). The spread in the data on PC 2 is of some concern, especially when identifying unknowns. A method of outlier detection and removal would be helpful in decreasing the within-group scatter. The T2 statistics and 95% T2 limits were calculated for each of the groups of proteins shown in Figure 6a. The T2 statistic is different from the 95% confidence ellipse drawn around each group, which only describes variation in two dimensions. The T2 statistic is a multivariate statistical test and can be used to detect outliers in any of the retained PCs. Since the T2 statistic depends on the number of PCs retained in the PC model, cross-validation by the leave-one-out method as discussed above was used to determine the appropriate number of PCs to retain in each PC model. Enough PCs were retained to maintain a W statistic greater than 1. Of the spectra shown in Figure 6a, only four spectra were found to be outliers (T2 > 95% T2 limit) from their respective groups. One BSA spectrum and one lysozyme spectrum were found to be outliers due to overlap of the C4H4NO2+ peak with the C5F2+ peak at m/z 98. Another BSA spectrum was found to be an outlier due to poor signal at the higher end of the mass range (probably due to poor charge neutralization of the sample). Finally, one myoglobin sample was found to be an outlier due to slight contamination with poly(dimethylsiloxane) (PDMS), as evidenced by the silicon-containing peaks at m/z 73 and m/z 147. These samples have been noted in Figure 6a. Exclusion of these samples does not significantly change the relationship of the remaining samples on the first three PCs. PCA for Classification of ToF-SIMS Spectra. PCA can also be used to classify ToF-SIMS spectra of proteins of unknown composition across different substrates. A PC model was developed using the first seven PCs from PCA of the positive ion spectra of proteins adsorbed to mica from 100 µg/mL single component solutions. The first seven PCs were retained to maintain a W statistic greater than 1 as described above. The amino acid peaks overlapping with fluorocarbon peaks (m/z 69 and 131) were omitted from the model data set when modeling the protein on PTFE spectra. SIMS spectra of proteins adsorbed onto PTFE and silicon wafer from 100 µg/mL single component protein solutions were then projected onto the PC model. In both cases, the structure of the data set on the first two PCs (Figure 7a,b) was very similar to that for the proteins on mica data set (Figure 3a). ToF-SIMS spectra from a time series adsorption of 1% bovine plasma onto mica was projected onto the first seven PCs from the PCA model from the pure protein spectra on mica shown in Figure 3a. The first seven PCs were chosen to maintain a W statistic greater than 1. The scores on the first two PCs of the plasma protein spectra and several of the spectra of the pure proteins adsorbed to mica are shown in Figure 8. All of the protein spectra shown in Figure 3a were used to make this PC model and the location of the scores of the model spectra are the same in both Figure 3a and Figure 8. Projection of the spectra of the plasma protein film onto the PC model developed using pure protein films shows that the major
4656
Langmuir, Vol. 17, No. 15, 2001
Wagner and Castner
Figure 5. (a) Scores plot from PCA of the positive ion spectra of four types of serum albumin adsorbed from 100 µg/mL single component albumin solutions. This scores plot shows that ToF-SIMS is sensitive enough to distinguish very similar adsorbed protein films. The ellipses drawn around each of the groups represent the 95% confidence limit for that group on PCs 1 and 2. (b) Loadings plot for the first two PCs. The loadings are ordered in increasing mass.
component of the plasma protein film appears to shift away from fibrinogen as adsorption time increases. Discussion ToF-SIMS spectra of adsorbed protein films are very complex and contain peaks from each of the amino acids. Traditional analysis of SIMS spectra often involves selection of a few peaks and tracking their changes in relative intensity across several spectra.63 This approach (63) Tidwell, C. D.; Castner, D. G.; Golledge, S. L.; Ratner, B. D.; Meyer, K.; Hagenhoff, B.; Benninghoven, A. Surf. Interface Anal. 2001, in press.
requires previous knowledge about the data set, which may not be available. Analysis of every combination of peaks in the spectrum quickly becomes impossible due to the great number of combinations available to the analyst. Therefore, multivariate analysis methods must be employed to analyze the SIMS spectra. PCA can readily distinguish between protein spectra on a wide variety of substrates using the positive ion ToFSIMS spectra. While the negative ion spectra show some separation of the protein spectra, little physical insight into the composition of the protein film is given. Since the CN- and CNO- peaks load highly into the first PC and these peaks are present in all of the amino acids, no insight
Characterization of Adsorbed Protein Films
Langmuir, Vol. 17, No. 15, 2001 4657
Figure 6. Scores plots from PCA of the positive ion spectra of proteins adsorbed to PTFE from 100 µg/mL single component protein solutions. (a) PCs 1 and 2 (six groups well separated). (b) PCs 1 and 3 (seven groups well separated). (c) Loadings plot for the first three PCs. The loadings are ordered in increasing mass. (d) Scores plot from PCA of the positive ion spectra of the overlapping proteins in (a, b). Since PCA captures the directions of greatest variation in the data set, these protein spectra are now able to be distinguished. The outlier spectra as determined by the T2 statistic are noted in (a). The ellipses drawn around each of the groups represent the 95% confidence limit of that group on the PCs shown in that plot. Table 2. Comparison of the Bulk Amino Acid Compositions (%) of Four Serum Albumins bovine serum albumin
chicken serum albumin
human serum albumin
porcine serum albumin
MW
66 357
67 113
66 396
66 628
alanine arginine asparagine aspartic acid cysteine glutamine glutamic acid glycine histadine isoleucine leucine lysine methionine phenylalanine proline serine threonine tryptophan tyrosine valine
8.06 3.95 2.40 6.86 6.00 3.43 10.12 2.74 2.92 2.40 10.46 10.12 0.69 4.63 4.80 4.80 5.66 0.34 3.43 6.17
7.26 4.39 2.70 6.93 5.91 5.74 8.45 4.05 2.20 5.57 6.76 7.94 3.21 4.90 4.39 6.42 3.72 0.00 3.38 6.08
10.60 4.10 2.91 6.15 5.98 3.42 10.60 2.05 2.74 1.37 10.43 10.09 1.03 5.30 4.10 4.10 4.79 0.17 3.08 7.01
8.42 4.47 2.23 6.36 6.01 3.44 10.48 2.75 3.09 3.95 10.65 9.79 0.00 4.98 5.15 3.95 4.47 0.34 3.78 5.67
a Note the high degree of similarity between the human and bovine serum albumins, while the chicken serum albumin is most dissimilar from the other three proteins.
into the amino acid composition of the adsorbed protein is gained from examination of the negative ion spectra. The second PC gives information on the relative amount
of cysteine, but the limited number of peaks selected from the negative ion spectra do not give information on all of the amino acids present in the protein. Ongoing work in our laboratory is attempting to assign other negative ions to particular amino acids to increase the amount of usable information in the negative ion spectra. PCA of the positive ion spectra of the proteins on mica showed significant separation between groups. Since the organization of the spectra on the new PC axes reflects the bulk amino acid composition of the proteins, the proteins most likely denature on the surface under vacuum conditions. Cold-stage SIMS may help maintain the hydrated structure of the protein on the surface during analysis.44,64 The within-group scatter could also be due to different degrees of denaturation of the protein on the surface. However, since these spectra were acquired over a 200 µm × 200 µm area, long-range order on the surface would be necessary to see differences in orientation or extent of denaturation of the proteins on the surface. The order of the proteins on the surface could be investigated by techniques such as near edge X-ray absorption fine structure (NEXAFS),65,66 a synchrotron radiation surface analysis technique that has been used to probe the structure of alkanethiol self-assembled (64) Derue, C.; Gibouin, D.; Lefebvre, F.; Rasser, B.; Robin, A.; LeSceller, L.; Verdus, M. C.; Demarty, M.; Thellier, M.; Ripoll, C. J. Trace Microprobe Technol. 1999, 17, 451. (65) Heald, S. M. In X-ray AbsorptionsPrinciples, Applications, Techniques of EXAFS, SEXAFS and XANES; Koningsberger, D. C., Prins, R., Eds.; John Wiley & Sons: New York, 1988. (66) Ohta, T. J. Electron Spectrosc. Relat. Phenom. 1998, 92, 131.
4658
Langmuir, Vol. 17, No. 15, 2001
Wagner and Castner
Figure 7. Projection of the positive ion spectra of proteins adsorbed onto (a) PTFE and (b) silicon wafer from 100 µg/mL single component protein solutions onto a PC model generated using the first seven PCs from PCA of the spectra of proteins on mica. Note that the arrangement of the protein spectra on these plots is very similar to the arrangement of the spectra in Figure 3. The percent variance captured by PCs 1 and 2 for (a) is slightly different from Figure 3 due to the omission of the peaks at 69 and 131 m/z. These peaks were omitted due to overlap with fluorocarbon peaks.
monolayers on gold.67,68 Since all of the substrates used in this study were insulating to some degree, the scatter in the data could also result from differences in charge neutralization efficiency across the spectra. Furthermore, the separation between groups seems to break down near the origin of the scores plots due to similarities
between the protein spectra. Other methods of multivariate analysis, such as linear discriminant analysis (LDA),69 are useful for minimizing the within-group scatter while maximizing the between-group differences. These multivariate methods will be the subject of a future report.70
(67) Dannenberger, O.; Weiss, K.; Himmel, H.-J.; Jager, B.; Buck, M.; Woll, C. Thin Solid Films 1997, 307, 183. (68) Fischer, D.; Marti, A.; Hahner, G. J. Vac. Sci. Technol. A 1997, 15, 2173.
(69) Mallet, Y.; Coomans, D.; deVel, O. Chemom. Intell. Lab. Syst. 1996, 35, 157. (70) Wagner, M. S.; Graham, D. J.; Tyler, B. J.; Castner, D. G. Manuscript in preparation.
Characterization of Adsorbed Protein Films
Langmuir, Vol. 17, No. 15, 2001 4659
Figure 8. Projection of the positive ion spectra of a time series of 1% bovine plasma adsorbed onto mica onto a model generated using the first seven PCs from PCA of the spectra of proteins adsorbed onto mica. All of the spectra shown in Figure 3 were used to generate this model, but several spectra have been removed from this plot for easier viewing. Note that the composition of the plasma protein film appears to shift away from mostly fibrinogen as adsorption time increases. The ellipse drawn around each of the model groups represents the 95% confidence limit for that group on PCs 1 and 2.
It should be noted that the mean-centering (as opposed to autoscaling) of the data set makes the loadings plot easier to interpret due to the maximization of the loadings for a few peaks instead of several peaks loading more or less equally. Autoscaling results in each peak carrying the same weight (i.e., variance) in the data set, which may not be the case with SIMS data. Certainly, peaks with a greater variance will contain more information in the SIMS spectrum, so scaling the variance of each peak to be unity (as is done in autoscaling) is probably not appropriate. The proteins with the largest differences in amino acid composition are the farthest apart on the scores plot because the principal components capture the directions of the greatest variation in the data set. This fact may make separation of similar SIMS spectra difficult when proteins that are very different are included in the data set. This overlap can be corrected by removing the very different proteins and only including the overlapping proteins in the data set. PCA can then readily distinguish between the remaining protein spectra. Figure 6 shows the separation of the albumin spectra for several species. Since proteins are highly conserved both structurally and sequentially from species to species, it is expected that the ToF-SIMS spectra would be very similar. However, PCA of the ToF-SIMS data can readily distinguish between three of the proteins present in the data set. When all the albumin spectra are included in the larger data set of proteins on mica shown in Figure 3, the albumin spectra from the different species overlap with bovine serum albumin (data not shown). ToF-SIMS and PCA can also readily distinguish between proteins on substrate with high matrix effects,71 such as PTFE. The within-group scatter is slightly higher than in the data set of the proteins on mica, which may (71) Thompson, P. M. Anal. Chem. 1991, 63, 2447.
be due to patchy coverage of the protein on the PTFE.33 Several spectra around the origin of the scores plot in Figure 7a overlap due to the similarity of these spectra. However, when PCA is performed on these spectra alone, the group separation is markedly improved. This fact makes PCA a good method for the identification of different proteins on a given substrate. The detection of outliers using the T2 method was also useful for finding spectra within a large data set that suffered from contamination or poor spectral quality. The T2 method in combination with cross-validation creates a powerful, high-throughput method for screening spectra for quality. Of course, this creates an environment where “majority rules” since the T2 statistic is a measure from the multivariate mean. Therefore, data quality must be ensured by examining representative spectra from each group for contamination or degradation in spectral quality (due to poor charge neutralization, for example). By creating a model with the spectra from proteins on mica and projecting the spectra from proteins on silicon or proteins on PTFE onto it, the same organization of the protein spectra on the scores plot is apparent. This suggests that a PC-based method of multivariate analysis may be useful for the classification of protein SIMS spectra between substrates. One such method is soft independent modeling of class analogies (SIMCA), which consists of several PC models and classifies unknown spectra into one group, several groups, or none at all.72 Multivariate analysis methods such as SIMCA may be useful in the quantitative classification of unknown ToF-SIMS spectra. The similarity of the spectra between substrates also suggests that the substrate effect on the protein orientation, denaturation, secondary ion yields (matrix effects), etc. may not be important under the experimental conditions used in this study. Matrix effects may become (72) Wold, S. Pattern Recognition 1976, 8, 127.
4660
Langmuir, Vol. 17, No. 15, 2001
more important as the surface concentration is decreased, making comparisons of protein spectra across substrates more difficult. ToF-SIMS and PCA can also provide insight into complex adsorbed protein films such as 1% bovine plasma. The PC model suggests that the composition of the adsorbed protein film from 1% bovine plasma shifts away from mostly fibrinogen as adsorption time increase. These data are supported by Brash and co-workers73,74 for a variety of substrates. These authors demonstrated that fibrinogen is the first protein to absorb to artificial surfaces and is later replaced by other proteins in dilute plasma solutions. This phenomenon has been called the Vroman effect by Horbett.75 Time-of-flight SIMS and PCA provide a straightforward means for the investigation of complex adsorbed protein films. Conclusions Static time-of-flight SIMS is a powerful technique for the investigation of adsorbed protein films on biomaterial surfaces. We have demonstrated the following: (a) Principal component analysis of the ToF-SIMS data can readily separate the protein spectra, even for spectra that are very similar. SIMS may even be sensitive to the
Wagner and Castner
orientation or conformation of the protein on the surface.31,42,63,76 (b) Principal component analysis assists in the analysis of ToF-SIMS spectra by identification of peaks that vary the most in a sample set, lending insight into the changes in the surface chemistry of the samples. (c) Multivariate statistical limits for the spectra can be generated in the PC space which are useful for the determination of outliers in a data set. (d) ToF-SIMS and PCA can give insight into the composition of complex adsorbed protein films (e.g., 1% bovine plasma protein film). (e) Principal component based multivariate analysis methods may be used for classification of unknown spectra. The quantification of the relative surface composition of complex adsorbed protein films and the classification of unknown spectra 70 will be the subject of future reports. In summary, static ToF-SIMS in conjunction with PCA provides a useful method for investigating the composition of adsorbed protein films. Acknowledgment. This research was supported by NIH Grant RR-01296 from the National Center for Research Resources. LA001209T
(73) Uniyal, S.; Brash, J. L. Thromb. Haemostas. 1982, 47, 285. (74) Brash, J. L.; tenHove, P. Thromb. Haemostasis 1984, 51, 326. (75) Horbett, T. A. Thromb. Haemostasis 1984, 51, 174.
(76) Lhoest, J.-B.; Detrait, E.; Aguilar, P. v. d. B. d.; Bertrand, P. J. Biomed Mater. Res. 1998, 41, 95.