Article pubs.acs.org/jpr
Standardization and Utilization of Biobank Resources in Clinical Protein Science with Examples of Emerging Applications György Marko-Varga,*,†,‡ Á kos Végvári,† Charlotte Welinder,§ Henrik Lindberg,† Melinda Rezeli,† Goutham Edula,∥ Katrin J. Svensson,§ Mattias Belting,§ Thomas Laurell,† and Thomas E. Fehniger†,⊥ †
Clinical Protein Science & Imaging, Biomedical Center, Department of Measurement Technology and Industrial Electrical Engineering, Lund University, BMC C13, SE-221 84 Lund, Sweden ‡ First Department of Surgery, Tokyo Medical University, 6-7-1 Nishishinjiku Shinjiku-ku, Tokyo, 160-0023 Japan § Department of Oncology, Clinical Sciences, Lund University and Skåne University Hospital, Barngatan 2B, SE-221 85 Lund, Sweden ∥ Respiratory & Inflammation Therapy Area, AstraZeneca R&D Lund, Sweden, 21 00 Lund, Sweden ⊥ Institute of Clinical Medicine, Tallinn University of Technology, Akadeemia tee 15, 12618 Tallinn, Estonia ABSTRACT: Biobanks are a major resource to access and measure biological constituents that can be used to monitor the status of health and disease, both in unique individual samples and within populations. Most “omic” activities rely on access to these collections of stored samples to provide the basis for establishing the ranges and frequencies of expression. Furthermore, information about the relative abundance and form of protein constituents found in stored samples provides an important historical index for comparative studies of inherited, epidemic, and developing disease. Standardizations of sample quality, form, and analysis are an important unmet need and requirement for gaining the full benefit from collected samples. Coupled to this standard is the provision of annotation describing clinical status and metadata of measurements of clinical phenotype that characterizes the sample. Today we have not yet achieved consensus on how to collect, manage, and build biobank archives in order to reach goals where these efforts are translated into value for the patient. Several initiatives (OBBR, ISBER, BBMRI) that disseminate best practice examples for biobanking are expected to play an important role in ensuring the need to preserve the sample integrity of biosamples stored for periods that reach one or several decades. These developments will be of great value and importance to programs such as the Chromosome Human Protein Project (C-HPP) that will associate protein expression in healthy and disease states with genetic foci along of each of the human chromosomes. KEYWORDS: biobank, healthcare, disease, biomarkers, protein, Human Proteome Project, biological specimen banks, biological specimen banks/ethics, biological specimen banks/legislation and jurisprudence, biomedical research/ethics, ethical review, government regulation, humans, informed consent, tissue donors, ethics
1. INTRODUCTION Millions of clinical samples are obtained everyday for use in diagnostic tests that support clinical decision-making. Clinical samples can also be archived into repositories for use in future studies investigating the root causes of disease using genetic, genomic, proteomic, and metabonomic approaches. Today it is estimated that over one billion clinical samples are assembled into so-called biobanks, also known as biospecimen resources, and stored throughout the world.1 Implied but not visible within these sample number estimates are the costs in resources (material, financial, and human) required to establish and maintain biobanks as indexed by the Rand Corporation and the National Institutes of Health (Bethesda).2,3 In 2009, TIME Magazine highlighted the importance of biobanks as “10 Ideas Changing the World Right Now”.4 Three years later, we are still developing our understanding as how to best utilize the existing resources and how to develop them further. © 2012 American Chemical Society
We, the contributing authors of this report, represent the collective perspective of both end users and administrators of banked human clinical samples. Together we share experience in this activity from positions in academic research, commercial health care, the drug development industry, and government regulating agencies. We have taken up this subject because we strongly believe that there is need for increasing the awareness within the clinical proteomics community regarding the infrastructure buildup of biobanking but also to drive home the message of the great need to provide standardization in the content of biobanks, namely, the intrinsic value of the samples themselves for use in any study. We will raise several points of discussion regarding the state of standard practice within any Received: February 26, 2012 Published: May 18, 2012 5124
dx.doi.org/10.1021/pr300185k | J. Proteome Res. 2012, 11, 5124−5134
Journal of Proteome Research
Article
one institution or between any multiple of similar collections that represent either local or globally placed biobanks. Clinical proteomics has a rich history in providing diagnostic tools and diagnostic markers into clinical practice. An example is the development by Carl-Bertil Laurell and his co-workers of both the technology of electrophoretic separation of serum proteins but also the identification of α-1-antitrypsin as an important marker of inherited disease.5 While many hospital analyses measuring clinically relevant proteins are performed on fresh samples that are then discarded, other applications of clinical proteomics rely upon collected samples held in biobanks. The products of many clinical proteomic studies are descriptive profiles of protein abundance, form and activity. This census-taking format is often reported as statistically associated spectra of expression or associations with biological processes, often in the context of disease presentation as articles in academic journals. But just how reliable are these reports? Clinical proteomics is not the only activity that needs to be aware here. The same prerequisite for standardization and quality also pertain to every “omics” study utilizing clinical samples, be that genomic analysis, transcription analysis, metabolome analysis, or protein analysis. Regardless of the aims of the research study, whether academic, clinically related or commercially driven, and irrespective of the location of the biobank, the clinical proteomics community has a secure stake in maintaining the level of quality in sample handling, sample measurement, and sample readout. The results of clinical sample testing are used for many purposes. They provide clinicians with levels of dynamic measurement used to assess clinical behavior. They provide academic researchers a contact with the variable protein constituents of biological processes. They provide commercial biotechnology and drug development industries with markers of health and disease used to support the invention of new medicines for providing more effective health care. They provide sentinel government regulating agencies with the historical record of previous status of health and disease in whole populations of peoples in either local, national or global constellations. However, in the end analysis, if we cannot guarantee that the samples are comparable because the methods used to process, store, and analyze the samples do not provide equivalence, then we cannot provide assurance that the measurement is reliable or that the results can be related to one another or with the limits of accuracy demanded by the end user. Since the sample arising from each study subject is acquired, processed and stored in time and place, it is important to establish some criteria that the samples grouped together in any study represent an equal chance for both discovery and for being discovered. It is at this point where the need for standardization in process and measurement cannot be overstated. We believe that such standardization is not in place today. We also believe that it is the responsibility of those using and depending upon access to stored clinical samples to be both proactive and interactive in establishing these standards.
Table 1. Biobank Repositories of Clinical Samples location local national multinational global focus
management
governance
single investigator institutional review board institutional informed consent multicenter national law commercial international law stored biospecimen storage conditions
single disease complex disease inherited disease environmental rare diseases population-based studies drug/clinical trials
RNA DNA proteins peptides lipids metabolites
fluids cells tissue organs body
short-term −20 °C long-term −80 °C long-term −260 °C room temperature
collections in pathology laboratories, or (2) samples collected in the past decade and processed using automated sample handling followed by cold storage in dedicated facilities at deep freezing temperatures (−80 and −260 °C). Most biobanks are funded publicly, and sample storage often becomes a question of budget resources. The biobanks could represent a small-scale collection of a few samples of selected observation points made by an individual investigator. Or the collections may be part of large studies sponsored by a University hospital or part of a multicenter national study that assembles and shares samples within consortiums of investigators studying different aspects of a disease process. Commercial biobanks are also growing in number, especially in areas where pharmaceutical development is focused. Publicly funded support to develop the infrastructure of large biospecimen collections held at multiple sites in multinational studies is also becoming more common today. Some of these repositories are solidly emplaced in our systems of normal care such as blood banks, organ banks used in transplantation, and sperm and ova banks used in fertilization. Beyond their use in clinical care, these collections also support research activities, such as the investigation of disease mechanisms in both rare and common diseases, measurements of biomarkers operating during complex diseases, measurement of markers for risk of disease in population-based surveillance studies, and in clinical trials monitoring the effects of drugs on eventual outcome.2,6−9 This latter activity within clinical trials has gained much attention as new treatment paradigms such as personalized medicine are put into standard practice.7,8,10−12 As recently argued by leaders of the National Institutes of Health (NIH) and the Food and Drug Administration (FDA), “The success of personalized medicine depends on having accurate diagnostic tests that identify patients who can benefit from targeted therapies”.10 The search for biomarkers for use in personalized medicine therapies is a highly active area, bridging disease areas, clinical specialties, and drug development but is highly dependent upon gaining access to clinical samples of high quality and in statistically relevant numbers for study comparisons.12−15 To support this contention, one only needs to perform a PubMed search on the term “personalized medicine” to identify the more than 45 000 related citations including over 3000 review articles. The ethical and legal regulations governing the use of biobanked samples are determined by the public law in place at the location of sampling and at the site of analysis. These governing rules pertain to both the academic and commercial use of the samples. Paramount to this point is the voluntary
2. BIOSPECIMEN BANKING TODAY Biospecimen resources are found in many forms and serve many purposes (Table 1). In rough terms, we can divide biobanks into two categories: (1) samples collected before year 2000 that represent, for the most part, manually processed samples that have been stored frozen at −20 or −80 °C in freezers located within general laboratories or as tissue block 5125
dx.doi.org/10.1021/pr300185k | J. Proteome Res. 2012, 11, 5124−5134
Journal of Proteome Research
Article
Figure 1. Schematic overview of the biobank work-flow and processing steps of samples.
all samples and data generated from the participant are required to be removed. Part of the process of managing biospecimen collections is also the dissemination of best practice guidelines for collecting, storing, and annotating individual samples into collections.18 For example in the USA, the National Cancer Institute has established the Office of Biorepositories and Biospecimen Research (OBBR) to disseminate information and recommendations to investigators on life cycle management collected samples with on the sample biospecimen as an object of investigation.19 Similarly, the International Society for Biological and Environmental Repositories (ISBER) provides a network of support for the planning and maintenance of sample collections.20 The European Commission sponsored initiative Biobanking and Biomolecular Resources Research Infrastructure (BBMRI) operates in a pan-European syndicate providing large-scale national standardization protocols to 280 biobanks in more than 30 member countries (http://www. bbmri.eu). Since each sample represents a one-of type that is a unique and finite holder of biological information, each study needs to assemble collections of these samples that have statistical power in order to discover measurements useful for comparisons of normal to abnormal or healthy to diseased. Represented in these repositories are commonly found plasma and serum, blood cells and products for transfusion, living cells, biopsies and resected tissue, as well as urine and specialized fluids obtained in clinical procedures such as lavages, sputum, and synovial fluids. It is likely that every form of clinical sample from every human organ is found somewhere in a biobanks repository. The actual form of the biological material being stored, from DNA to whole organs, is determined by the purpose outlined specifically by study design. The conditions
subject informed consent giving permission for the specific use of these samples and the approval of an institutional review board guaranteeing the safety of the subject in obtaining the sample and in the use of these samples. Globally the ethical use of clinical samples is covered by the United Nations Universal Declaration of Human Rights and the Declaration of Helsinki by the World Medical Association (latest amendment 2008) that describes the Ethical Principles for Medical Research Involving Human Subjects, also known as the Helsinki declaration (http://www.wma.net/en/30publications/ 10policies/b3/index.html). Throughout the European Community, both The Council of Europe and individual countries (Denmark, Estonia, Finland, France, Germany, Iceland, Norway, Sweden, and the United Kingdom) have enacted regulations governing biobanks (http://conventions.coe.int/ Treaty/Commun/QueVoulezVous.asp?NT=146&CM= 8&DF=11/02/2012&CL=ENG).16,17 State-enacted public law is also in place in Australia, Canada, and Latvia with legislation pending in many more. The procedure whereby ethical approvals are given may significantly vary from one country to another. In some countries there are clear rules and regulations whereby permissions for a given study is given. In others, there can be ethical boards in various regions of a nation that might judge the application differently, taking various considerations into account. The ethical aspects regarding the patient is more similar in most countries, where the patient is informed about “why” samples will be stored and used, with appropriate information regarding the purpose and the reasons to the consent that the patient gives. The use of the samples are also described in most cases, and above all, the patient or participant in a study producing biobank samples are always given the right to withdraw from the participation. In that case, 5126
dx.doi.org/10.1021/pr300185k | J. Proteome Res. 2012, 11, 5124−5134
Journal of Proteome Research
Article
Table 2. Increasing the Utility of Biobanked Samples ideal model Preserving the integrity of samples Internal analysis standards Automated sample processing Stable sample storage Clinical annotation of samples
Clinical phenotyping of patients
current practice
unmet needs
solutions adding value
Variety of in-house protocols varying from no to continuous diligence
Methods for judging sample fitness in test assays
SOP guidelines for method and processing time
No standard practice
Identification of landmark indices of sample integrity Large scale sample processing platforms Standardization of cold storage conditions Standardization of terms
Spiking samples with panels of degradation markers Rapid robotic processing
Variety of in-house practice Variety of in-house practice ranging from −20 °C to −210 °C Variety of in-house practice No current standard Often minimal at best Variety of in-house practice
Sufficient clinical description to allow groupings
No current standard
Adoption of minimal storage temperature (−80 °C) and no refreezing Adoption of naming conventions (SNOWMED, others)
Partnered involvement of clinicians in sample/ description Advancements in hierarchal disease phenotyping crossing over clinical disciplines
Often minimal at best
would need to meet fit-for-purpose rule-in rule-out criteria for inclusion in the study using sentinel internal standards. The acquisition of sufficient numbers of study samples could take months or years to complete. This would necessitate a standardized method for long-term cold storage that included internal standards to measure spontaneous sample decay over time. When sufficient numbers of samples were obtained, analysis would be done using standardized sample preparation and eventual sample separation protocols. The protein or peptide expression patterns would then be measured using modern mass spectrometry based methodologies, which provide sensitivity and precise identification versus the internal standards. In best case, an individual study subject would be represented by several samples acquired longitudinally over the period of clinical observation. This would allow trends of expression to be modeled within subject groups and between subject groups. Long-term follow-up annotation regarding the response to treatment, and the clinical outcome of treatment would be available for study. These latter sets of information could be supported by pathology reports or medical imaging. In the end, a completed picture of the expression profile of proteins and peptides associated with phenotypes of disease presentation could be used to advance the selection of biomarkers for further study. All of this information would be included in a database that included all pertinent information collected during the sample lifetime. This would ensure that generations of future research projects could utilize and build upon the foundations of understanding achieved at each level of study.
for sample preparation and storage should be optimized to maintain the integrity of the sample in a state that allows artifact-free analysis and thus require standardization for all samples in a study. In particular, the storage temperature is a critical factor in preserving the structure and function of the stored analytes present at the time of sampling. The current infrastructure and clinical investments that are being made into global biobanks are expected to add to the present as well as the future medical research. The major challenge that future health care institutions are facing is efficiency, and biobanking is expected to improve cost-effectiveness by these new investments forming a global integrated biobank infrastructure.
3. CLINICAL PROTEIN SCIENCE AND BIOBANKS IN THE BEST-CASE MODEL From their descriptions on paper, biobanks would appear to be a natural resource for studies of clinical protein expression. But how could clinical protein science best utilize the resources contained in the biospecimen repositories? What dimensions of scope in terms of ethics, logistics, infrastructure, sample cohort size, sample integrity, etc. are necessary and required? In general, there are milestones that need to be documented in the life cycle time of every sample including sample collection and processing, aliquoting, storage, and eventual use upon demand (Figure 1). In modeling a best-case scenario, we could imagine that the specimen collections we were interested in studying contained samples that represented qualitative or quantitative changes in protein/peptide expression patterns that were associated with observed changes in health status. This means that the samples would need levels of descriptive annotation that described the clinical condition from which the sample was generated. This description would include clinical measurements of disease phenotype in the form of metadata obtained about the subject such as structural measurements (medical imaging, pathology) or functional measurements with normalized indices (by gender, age, genetic background) for comparison. The samples would exist in the statistically required numbers that would allow power comparisons. The samples would be collected and processed equivalently immediately after sampling, using automated methods of sample separation, aliquoting, and storage that preserved the integrity of structure and function that were present at the time of sampling. Quality assurance would demand that each sample
4. ADDRESSING THE UNMET NEEDS Today, for most biobanked samples held globally, we are a long way from having achieved the level of best practice suggested in this contrived example. But the points outlined above do play and will continue to play a determining role in the future use of biobanks by the proteomics community. For example, a recent review of dedicated biobanks by the OBBR of NCI poignantly addressed the artifacts found in samples due to sample handling before storage and the variability in terms of assigning molecular signatures of disease to improperly handled samples.21 Despite the inherent problems in collecting, analyzing and comparing samples in small or large-scale format, we believe that biobanks represent a very important resource 5127
dx.doi.org/10.1021/pr300185k | J. Proteome Res. 2012, 11, 5124−5134
Journal of Proteome Research
Article
sample must meet in order to be included into a study. Samples must be validated as being fit for purpose using objective measurements that are matched and pertinent to the study objectives. In order to be able to rule-in or rule out the use of a sample in a study, we should require calibration of the sample versus internal standards. For proteomics this requires an understanding of both the dynamic range of analytes present in a sample volume but also a means for measurements of artifacts due to deficient sample handling. The process of proteolytic breakdown of samples over time needs to be better understood in stored samples. One means of monitoring sample degeneration in samples intended for mass spectrometry measurement is to include within samples a known concentration of a labeled peptide.25 In addition to the aforementioned international standardization activities,18−20 the SPIDIA project (http://www.spidia.eu), funded by the European Union within the seventh frame program (FP7) concerning the “Standardisation and improvement of preanalytical procedures for in vitro diagnostics”. SPIDIA is notable because it aligns in a modern context academic institutions, international organizations, and commercial life sciences companies that have as a common goal to develop procedures to improve sample quality.
for the proteomics community. Furthermore, we need to recognize that the same problems encountered in the analysis of stored samples need to be addresses in all clinical proteomics studies. For example, any study that claims to assign a biomarker to a biological process would have to address the questions of quality assurance, uniform sample collection and processing criteria, groupings of samples cohorts by clinical status, analysis sensitivity and specificity, and problems with “over-interpretation” of the data from both a statistical and clinical perspective.22 We can highlight six areas of sample lifecycle management that require the attention of the clinical proteomics community to meet the unmet need and possible ways for addressing these needs (Table 2). 4.1. Preserving the Integrity of Samples
Each sample represents a snapshot of the biological processes in place at the time of sampling. As such, the biological content of the sample determines the context as to how that sample could be used in study. Many proteins have relatively short halflives at 37 °C and are often present in samples along with degradative proteolytic enzymes. The first challenge for maintaining a tissue sample repository is in the establishment of quality standards to ensure that samples are equivalent both in terms of form (fresh, frozen, wax block, sections on slides, etc.), physical integrity (processing and storage), and availability for use (identity labels, aliquot size). Standard operating procedures (SOPs) should define the exact steps performed at each step in a sample life cycle to minimize both preanalytic variability and storage artifact. A number of supportive initiatives are underway to provide uniform recommendations in sample processing and sample storage.18−20,23,24 Standardization should also ensure that measures are in place for quality control that examine samples periodically. This would also include controls for user-initiated mistakes in labeling, processing, and storage of the sample as well as mistakes in data entry, including metadata.23 Together, sample integrity and quality control are critical issues in the evaluation of the usefulness of any tissue repository.
4.3. Automated Sample Processing
We are entering into an era of increasing capacity of sample analysis, where data outputs in the form of gene, protein, peptide and metabolite abundance are steadily increasing. Along with this increase in activity is an increased demand for the availability of the “right” samples for study. This is typically seen as demand for the case of high quality and wellcharacterized samples with detailed clinical records matched to the sample being readily available. As technology platforms such as mass spectrometry are becoming more powerful, miniaturization is also being developed, and the requirement for large sample volumes for analysis no longer is necessity. Today, typical sample analysis volumes of low microliters are common on MS platforms. This translates to almost an order of magnitude less in sample volume and a welcomed advancement from both the patients and clinicians perspective. In order to maintain the sample integrity and quality over time, the principle of single usage is also gaining acceptance. Single usage means that for any given stored sample in a biobank archive, it will be thawed once and never refrozen. This practice ensures that sample integrity is not variably changed by repeated thawing and freezing cycles. Another practice that is gaining acceptance is high-density sample storage with small aliquot sample volumes. The previous standard of using 5−10 mL sample volume tubes is being replaced by smaller volumes of 500 to