Limitations and Pitfalls in Protein Identification by Mass

2005, 16, 482.e Peng, W. P.; Cai, Y.; Chang, H. C. Mass Spectrom. ... 4. Limitations of PI Due to Data Processing and Data Mining .... from tandem mas...
2 downloads 0 Views 545KB Size
Chem. Rev. 2007, 107, 3568−3584

3568

Limitations and Pitfalls in Protein Identification by Mass Spectrometry Gert Lubec* and Leila Afjehi-Sadat Medical University of Vienna, Department of Pediatrics, Waehringer Guertel 18, A-1090 Vienna, Austria Received October 31, 2006

Contents 1. Introduction 2. Limitations of Peptide Identification (PI) by Sample Preparation 2.1. Spot Picking 2.2. In-Gel Protein Digestion 2.2.1. Selection of the Protease(s) or Chemical Agents 2.2.2. Nonspecific Cleavage and Missed Cleavages 2.3. Matrix as Limiting Factor 2.4. The Target (Sample Support) as a Factor for PI 2.5. Contaminants Hampering the PI Process 3. Limitations of PI by Instrumentation 3.1. The Role of Calibration for PI 3.2. The Factor “LASER” for PI 3.3. Importance of Maintenance 3.4. Selecting the Mass Spectrometry Method 3.4.1. Limitations of MALDI Technology 3.4.2. Limitations of ESI Technology 4. Limitations of PI Due to Data Processing and Data Mining 4.1. Spectra Quality 4.2. Peptide Identification 4.2.1. PI by Databases 4.2.2. PI by “Database-Independent” Strategies 4.3. Database ErrorssA Major Unsolved Problem for PI and PrI 5. Validation of Peptide and Protein Identification 5.1. Scoring for MS/MS Peptide Identification 5.2. Use of Randomized Databases for the Validation of PI 6. Limitations of PI by Protein Properties 6.1. Short and Very Short Proteins 6.2. Hydrophobic Proteins 6.3. The Isobaric Amino Acids Problem 6.4. The Problem of Low Complexity Regions of Proteins 6.5. Hypothetical Proteins, Proteins with Unknown Function, and Unknown Proteins 6.6. Protein Modifications: Artifacts and Post-translational Modifications 6.6.1. Chemical Modifications 6.6.2. Post-translational Modifications 6.7. Splice Variants/Isoforms * Telephone: +43-1-40400-3215. Fax: [email protected].

3568 3569 3569 3569 3569 3570 3570 3570 3571 3571 3571 3572 3572 3572 3572 3572 3573 3573 3573 3573 3574 3574 3575 3575 3575 3576 3576 3576 3576 3577 3578 3578 3579 3579 3579

+43-1-40400-3200. E-mail:

7. 8. 9. 10.

Protein Identification Conclusion Acknowledgments References

3580 3580 3581 3581

1. Introduction The advent of proteomics techniques for protein identification (PrI) represented a major step forward in protein chemistry,1,2 and indeed, a legion of laboratories are now using different methods of mass spectrometry for peptide identification (PI). Many thousands of proteomics articles are published annually, but not all proteins from these reports have been unambiguously identified. Criteria for reliable PrI became gradually more and more stringent, and a significant part of PrIs from the pastsand unfortunately also some present PrIssare not reliable.3 In this review, limitations and pitfalls are seen from two different viewpoints: that of a user and that of an editorial board member and reviewer of a top proteomics journal. Herein, problems, limitations, and shortcomings are addressed, and most mistakes were made also in our laboratory in the beginning of the proteomics area at different levels, instrumental and data mining. By and by, knowledge exponentially increased, and literature about PrI is abundant. While a host of publications praises and proposes the use and applications of mass spectrometry for PI, there is not too much information on the limitations and pitfalls. The wide use of mass spectrometry techniques is hampered by several factors: one factor is the limited experience of some investigators in mass spectrometry per se;4 a second factor is poor bioinformatic know-how, i.e., data mining (this a major factor for non- or misidentifications); and a third factor is low abundance proteins.5,6 A multitude of problems linked to PrI is being addressed herein, and these range from selecting a MS method to selecting an appropriate database. This review is not designed to address all open questions of MS technologies or for troubleshooting but to indicate some potential weaknesses of PrI. It is written to provide information on the reliability of PrI and finally on validation of the identification process for users of MS. It may serve to enable scientists to critically read publications in the field of proteomics and to probably avoid some mistakes and pitfalls. The article may be useful for the peer reviewing process to test the validity and confidence of identification methods. At this point, a controversy can be mentioned: that is, the fact that results from automated methods are very often and correctly not considered appropriate in favor of human experience. This may enable new discoveries and disprove erroneous existing data. On the other hand, human ap-

10.1021/cr068213f CCC: $65.00 © 2007 American Chemical Society Published on Web 07/24/2007

Pitfalls in Protein Identification by Mass Spectrometry

Gert Lubec is a Full Professor at the Medical University of Vienna and has been working in the field of protein chemistry for two decades. With the advent of proteomics, he focused on neuroproteomics and carried out basic analytical methodology as well as applied work in animal models of several central nervous system disorders, such as Down’s Syndrome. He is currently involved in the investigation of proteins linked to cognitive function in rodents.

Chemical Reviews, 2007, Vol. 107, No. 8 3569

the gel, the better are the identification results, although proteins from very oldsi.e. many year oldsgels have been (not unambiguously) identified. Manual spot picking can be used for individual samples only, due to the time factor. The general disadvantages of manual analyses, including a higher need for personal mixup of samples and keratin contamination from human skin and hair (in the experience of our laboratory, the increased time for manual spot picking leads to a higher probability of contamination; unpublished observation)9 that in turn would hamper PI, should be considered. Sample to sample contamination is also higher in manual spot picking (http:// www.shimadzu-biotech.net/pages/products/2/xcise.php). Automated spot picking is a must in high-throughput MALDI-TOF or MALDI-TOF/TOF analysis10 to make identification reliable. The use of the automated spot picker Proteineer SP (Bruker Daltonics, Bremen, Germany) is a valuable tool in our laboratory when used with disposable tips and gives good results with the exception of the following limitations: due to the picker, we are losing