Let the Data Flow! - American Chemical Society

Proteomics studies in general and biomarker research in particular generate large volumes of data. To reach into the depths of complex biological samp...
1 downloads 7 Views 86KB Size
E ditor - in - chief

William S. Hancock

Barnett Institute and Department of Chemistry Northeastern University Boston, MA 02115 617-373-4881; fax 617-373-2855 [email protected] Associate E ditors Joshua LaBaer Harvard Medical School György Marko-Varga AstraZeneca and Lund University Martin McIntosh Fred Hutchinson Cancer Research Center Cons u lting E ditor Jeremy K. Nicholson Imperial College London E ditorial adv isory board Ruedi H. Aebersold ETH Hönggerberg Rolf Apweiler European Bioinformatics Institute Ronald Beavis Manitoba Centre for Proteomics Rainer Bischoff University of Groningen Dolores Cahill University College Dublin Thomas P. Conrads University of Pittsburgh Cancer Center Thomas E. Fehniger AstraZeneca Catherine Fenselau University of Maryland Daniel Figeys University of Ottawa Craig Gelfand BD Diagnostics Brian Haab Van Andel Institute Sam Hanash Fred Hutchinson Cancer Research Center Albert Heck Utrecht University Stanley Hefta Bristol-Myers Squibb Denis Hochstrasser University of Geneva Elaine Holmes Imperial College London Michael J. Hubbard University of Melbourne Donald F. Hunt University of Virginia Barry L. Karger Northeastern University Joachim Klose Charité-University Medicine Berlin Setsuko Komatsu National Institute of Agrobiological Sciences David M. Lubman University of Michigan Matthias Mann Max Planck Institute of Biochemistry David Muddiman North Carolina State University Robert F. Murphy Carnegie Mellon University Gilbert S. Omenn University of Michigan Nicolle Packer Proteome Systems Limited Akhilesh Pandey Johns Hopkins University Peipei Ping University of California, Los Angeles Henry Rodriguez National Cancer Institute Michael Snyder Yale University Clifford H. Spiegelman Texas A&M University Hanno Steen Children’s Hospital Boston Timothy D. Veenstra SAIC-Frederick, National Cancer Institute Scot R. Weinberger Molecular Sensing, Inc. Susan T. Weintraub University of Texas Health Science Center John R. Yates, III The Scripps Research Institute

editorial

Let the Data Flow!

P

roteomics studies in general and biomarker research in particular generate large volumes of data. To reach into the depths of complex biological samples, multidimensional separations combined with high-resolution MS are required, and these often result in a few gigabytes of raw data per hour. Modern HPLC and MS equipment allow us to design ever-more-powerful combinations, but the sheer amount of data threatens to overwhelm our capacity to extract meaningful information. Thus, after having done our utmost to generate all these data, we engage in the rather counterintuitive activity of trying to reduce the raw data to a minimal amount. Certainly, high-resolution data are important, but so is proper data reduction. Without appropriate methods for data reduction, we may lose what we gained in the first place. So how should raw data be processed to conserve as much information as possible? How should statistical approaches deal with the imbalance between the amount of data (or variables) and the number of measured samples? To be honest, there is no golden path through this maze, but maybe a few sensible suggestions could help. Because mass spectrometers have the habit of acquiring data even when no analytes are present, the raw data files contain many zeros. Therefore, removing zeros is often the first step in data reduction. The next step could be to discriminate signals from artifacts. To do this properly, it is important to have a good estimate of the intensity and variability of the baseline noise. In my laboratory, we have opted to acquire data in profile mode instead of centroid mode. Variability in the data from different analyses is the next difficulty that must be overcome. It is critical to discriminate between analytical and biological variability. Analytical variability should be determined by standard procedures of method validation (e.g., repeated analyses and quantitative parameters based on standard addition experiments, such as linear range, limit of detection, and recovery). Even when stableisotope labeling approaches are used, it is important not to forget about the basics of analytical method validation (e.g., what if the method is used outside its linear range?). To control biological variability, a good experimental design is important. To summarize, I think that most of us have learned how to let the data flow. In addition, we are learning to keep analytical variability in check and try our best in collaboration with our biology or clinical colleagues to control biological variability. However, we are only starting to understand how we should channel the data flow so that ultimately the information we gather will lead to relevant findings, such as biomarkers for a given disease. That makes it an exciting time to be in this field of research.

478 Journal of Proteome Research • Vol. 7, No. 2, 2008

RAINER BISCHOFF Department of Analytical Biochemistry, Centre for Pharmacy University of Groningen (The Netherlands)

© 2008 American Chemical Society