Subscriber access provided by TULANE UNIVERSITY
Tutorial
EXPERIMENTAL DESIGN IN CLINICAL ‘OMICS BIOMARKER DISCOVERY Jenny Forshed J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.7b00418 • Publication Date (Web): 02 Oct 2017 Downloaded from http://pubs.acs.org on October 2, 2017
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 28
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
EXPERIMENTAL DESIGN IN CLINICAL ‘OMICS BIOMARKER DISCOVERY Jenny Forshed* *Department of Oncology-Pathology, Karolinska Institutet, Stockholm, Sweden.
ABSTRACT This tutorial highlights some issues on experimental design in clinical ‘omics biomarker discovery, how to avoid bias and get as true quantities as possible from biochemical analyses and how to select samples to improve the chances to answer the clinical question at issue. This includes the importance of defining clinical aim and endpoint, about knowing the variability in the results, randomization of samples, sample size, statistical power and how to avoid confounding factors by including clinical data in the sample selection, i.e. how to avoid unpleasant surprises at the point of statistical analysis. The aim of this tutorial is to help out in translational clinical and pre-clinical biomarker candidate research, to improve the validity and potential of future biomarker candidate findings.
KEYWORDS Experimental design, translational research, clinical ‘omics, biomarker discovery, biomarker candidates.
1 ACS Paragon Plus Environment
Journal of Proteome Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
INTRODUCTION Specific for ‘omics experiments are that they generate many variables/features (e.g. proteins or genes), but are in general expensive and time consuming so only relatively few samples can be analyzed. These shortcomings of ‘omics data can be referred to as the “curse of dimensionality”.1 This means special challenges in statistical data analysis and sample selection when aiming at biomarker candidate findings with sufficient power, specificity and generalizability.
Despite the discovery of many potential biomarkers with genomics and proteomics, very few have been developed into clinical applications.2-8 Some causes are clinical validation failure, non-optimized clinical translation and promotion despite nonpromising evidence. It has been claimed that conclusions drawn from biomarker discovery research are often false.4,9 One of the reasons that has been identified is the lack of a well-reasoned study- and experimental design.2-6 This was also recently discussed in Nature; among other things, experimental design and data quality must be more carefully examined by reviewers in the future.10 A well-reasoned experimental design, planned in concordance with the clinical study design11 before starting sample collection and biochemical analyses will hence improve the chances to develop biomarkers.
The following guidelines is meant to help in ‘omics experimental design. Each section in the document is summarized with a checklist to facilitate implementation.
2 ACS Paragon Plus Environment
Page 2 of 28
Page 3 of 28
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
EXPERIMENTAL DESIGN Experimental design can be defined as the protocol that defines the populations of interest, selects the individuals for the study from the populations, allocates them to treatment groups and arranges the experimental material in space and time.12 Experimental design in a clinical study hence includes several different levels. Clinical study design are of course highly integrated with experimental design, but is thoroughly described elsewhere,13 and therefore mentioned just briefly here. This document will focus on the ‘omics experimental design; the planning around the analytical experiments. The following sections include issues that I have identified as important in experimental design for clinical biomarker discovery in ‘omics in general, with some examples from proteomics analysis.
1. Define your research question Unclear aim and endpoint in a biomarker study can have several reasons. Either the aim is not clearly defined at start, or the aim changes as the study proceeds. A changed aim can be due to unexpected or unwanted results. If the variability in the study exceeds the effect, no conclusions about the initial hypothesis can be drawn. If then another hypothesis is formulated, based on the same data, that new hypothesis has to be included in the initial study design to avoid bias. The same holds for studies where the biomarker discovery comes as secondary or tertiary objective. Conclusions drawn from flexible study designs and data analysis increase the probability of obtaining a significant result but the results are often false.14
Submitting your study protocol and experimental design to a registry database such as the Open Science Framework (osf.io) or ClinicalTrials.gov encourages transparency 3 ACS Paragon Plus Environment
Journal of Proteome Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
and well-powered studies. It also reduces opportunities for selective reporting. Also, include into the study protocol how your results should be validated. Sample selection for a validation cohort should be considered before the start of the study to improve biomarker development from potential findings.
Note that finding biomarker candidates connected to e.g. a diagnosis requires that there actually exist a “molecular diagnosis” corresponding to the clinical diagnosis. Molecular differences between groups of patients is possible to find only if the samples from the patients have general differences in the molecular composition corresponding to the grouping and that your analytical method can detect it. It is hence important to also define your “molecular hypothesis” in your research question.
Checklist for outline of study protocol
Define your research question clearly. Include molecular biology hypothesis and statistical hypothesis.
Define the clinical context and clinical utility.
Define the clinical outcome: implementation and target population. How to measure future effects.
Define minimally acceptable performance (eg. difference between groups), required power and specificity.
Describe and define subjects: cases and controls. Make inclusion and exclusion criteria.
Determine the true quantity of interest, depending on disease and sample matrix, and make sure the intended analytical method can reach your demands.
Make a plan for the statistical analysis: classification, quantification or time series.
Decide on early termination criterion of the study.
4 ACS Paragon Plus Environment
Page 4 of 28
Page 5 of 28
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
Plan for sample selection for a validation cohort.
Submit your study protocol and experimental design to a registry database.
2. Define and select samples Sampling Molecular data from chemical analysis will depend on both the molecular type of interest, the analysis method and the quality of the samples. It is hence essential to control the pre-analytical variables unrelated to the studied clinical condition.15,16 Describe the procedure for sampling, sample collection, storage protocols, sample treatment, and assay procedures depending on clinical question, disease, and the biochemical analysis method that is going to be used. Further, decide on sample quality control, e.g. protein concentration measurement.15 These issues have to be determined and discussed within the translational research group and standard operating procedures (SOPs) for each stage should be established16,17 as well as an informatics infrastructure.18 When having a procedure for the sample handling, measure variability on sampling procedures and sample treatment. This is to have control of critical steps in the sample treatment and to have an assessment of the expected variation from the sample processing (see 4. ‘Omics experimental design: Repeated measurements).
Note that there is also variability in the clinical data. Diagnosis, age, treatment (patients’ compliance), relapse free period etc. are not exact measures and always suffers from uncertainty, known or unknown. “Hard” clinical data such as death are preferable to use as endpoint, nevertheless all clinical data has to be integrated in the sample selection process. 5 ACS Paragon Plus Environment
Journal of Proteome Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Sample selection A totally random selection of samples from the population that we want to study is ideal. Then we can assume that the variability among the selected samples represents the biological variability in the population, and the sample mean will agree with the true population mean (no bias).12 Enough many samples will average out sample heterogeneity, resulting in a generalizable result (Figure 1).
Control Population
Case Population / = selected sample
Biochemical analysis
Statistical analysis Observations
Figure 1: Sample selection will give estimates for the whole population.
Unfortunately, it is often not possible to do a totally random selection of samples from the population covering a clinical cohort. It is also often impossible to select enough many samples to even out bias, especially in ‘omics studies. That is why we have to control for confounding factors in the sample selection:
Exclude extreme or erroneous samples because they are not representative for the cohort that you want to study or they include confounding information that will introduce bias. Exclude for example samples with uncertainty in response, diagnosis or other clinical data. Exclude samples that are differently collected or prepared with 6 ACS Paragon Plus Environment
Page 6 of 28
Page 7 of 28
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
different methods. Samples with chemical differences, e.g. pH, protein concentration etc, which cannot be adjusted for should be excluded. Chemically extreme or erroneous samples are detected by quality control methods, as discussed above. Confounding factors such as for example comorbidities that cannot be adjusted for are detected in clinical data (see below).
Include a “normal population” variability, e.g. age differences, date of diagnosis, different living conditions, some types of medication, etc. The “normal population” will of course differ depending on clinical question. If seeking for a biomarker for disease progression, the sample population will include only people with different stages of the disease. Research for biomarkers on therapy resistance includes patients with and without successful therapy, but all other clinical variables must have the same distribution within the groups to avoid bias in the results. Biomarkers for acute diseases must be validated in a cohort from a general population including all possible variability etc. Collect as much clinical data as possible to be able to check for confounders.
Identify biases and confounding factors in samples. We aim to find biomarkers that have a causal relationship with the clinical status that we have defined in our hypothesis (Figure 2). Hence we want to avoid bias and mixed information; confounders such as age, gender, BMI etc. We want to avoid the situation where we find a biomarker that can statistically distinguish our clinical question, but at the same time could be e.g. a BMI marker. This mixed information is impossible to separate in the concluding statistical analysis afterwards, but can be identified beforehand from the clinical data as variables correlating with the clinical question.
7 ACS Paragon Plus Environment
Journal of Proteome Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Confounder Disease
Causal relation
Biomarker
Mediator Collider
Figure 2: Showing a theoretical relationship between biomarkers, confounders, mediators, colliders and disease status. (Figure with inspiration from Richiardi et al.19)
The risk of confounders and bias can be minimized by stratification or matching based on the samples’ clinical data. Matching samples so that all clinical parameters are equally distributed between the groups, but the one that covers the study hypothesis, will decrease the risk of study bias. However, since the matching is not a totally random process, there is always a risk of introducing bias (that is not seen in the clinical data).
Some confounders can be identified already when defining the study, for example smoking as a confounder in lung cancer (LC) studies. Either the end point statistical analysis can be stratified into two groups and statistically test smokers (LC vs non LC) and non-smokers (LC vs non LC) separately. Or samples can be matched between the groups so that the LC group includes as many smokers as the non LC group. Or one decides to only study smokers OR non-smokers. Unknown confounders can be detected only when extensive clinical data for the samples included in the study are collected.
8 ACS Paragon Plus Environment
Page 8 of 28
Page 9 of 28
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
Patients from different groups/cohorts can also be matched in pairs, having for example the same diagnosis, age, and gender and have got the same treatment, but have differing prognostic outcome. We can then study the differences in prognostic outcome between samples pairwise, or compare on group level if they are different individuals and not compared with themselves as control. A good thing about a “pairing scheme” is that we will have the same spread of registered possible confounders between the study groups.
It is important not to match for mediators (there might be exceptional cases) that are naturally connected to the clinical outcome that we are studying (Figure 2). For example tumor size is tightly connected to cancer progression and must be allowed to be a confounder among the selected samples if we are about to study cancer progression. Matching for mediators for the biomarker will undermine the discovery process and make case and control more similar. So, adjust for mediators only if the mediator is unwanted/unrelated. The clinical data can also include colliders, noncausal associations. Adjusting for colliders can be misleading and introduce bias. (See Figure 2.)
Note
that
it’s
impossible
to
know
from
data
only
if
a
correlating
factor/variable/clinical parameter is a collider, mediator or confounder. You have to have knowledge on the research question, the clinic, biology etc. and reason about the sample selection in the translational research group. It’s also impossible to know if there are confounders if they are not measured, hence all possible clinical data is important to include in the sampling process.
9 ACS Paragon Plus Environment
Journal of Proteome Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Checklist for sample selection and preparation
Use standard operating procedures (SOP:s) for sampling and sample preparation.
Describe the procedure for the sample collection process.
Define storage protocols.
Determine the sample preparation.
Decide on necessary assay procedures.
Determine sample quality control procedures.
Make sure you have access to all possible clinical data.
Discuss possible confounders, mediators and colliders in the translational research group.
Check for mixed information by correlation analysis between your response variable and your clinical data.
Adjust for confounders by stratification or matching.
Do not adjust for mediators or colliders.
3. Analytical methods Decide on which analytical methods to use. Find out about the method’s analytical validity characteristics in terms of performance, sensitivity, variability and sample requirements. How to measure analytical method performance is exemplified in a Eurachem guide by Magnusson and Ornemark.20 Examples of methods and their variability and limitations are also found in WHO’s Guidelines on Standard Operating Procedures for Clinical Chemistry.17 Are extra samples for calibration and standardization required? If the method variability is unknown for your type of samples, the variability has to be determined by replicate sample measurements (see 4. ‘Omics experimental design: Repeated measurements)
10 ACS Paragon Plus Environment
Page 10 of 28
Page 11 of 28
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
If biological/clinical validation with additional methods is included as endpoint of the study, the validation method should be defined and decided upon. It’s also necessary to have figures on this method’s variability, sensitivity, specificity etc. Make sure that the validation method detects the same molecules, can you expect to measure the found potential biomarkers?
When deciding upon analytical methods it should also be determined where to store the data. It is usually required for publication to save the data to a repository. Making raw data available will enhance opportunities for data aggregation and meta-analysis, and allow external checking of analyses and results. Also, making research materials available improves the quality of studies aimed at replicating your research findings.6
Checklist for the chosen analytical methods:
Method performance: limit of detection and quantification, sensitivity, specificity.
Repeatability/variance: is it known or has to be determined?
Is the variance independent of signal intensity (homoscedastic) or is it heteroscedastic and has to be adjusted for in the data processing?
Required calibration and standardization: are extra samples needed for this?
4. ‘Omics experimental design The basics of experimental design were proposed by R. A. Fisher already in 1926 and 1935.21-23 His concepts have been broadly adapted in the physical and social sciences, and are still applicable: comparison, randomization, replication, blocking, orthogonality and factor experiments. Below I have used some of his concepts to point out important parts in experimental design in ‘omics biomarker discovery. 11 ACS Paragon Plus Environment
Journal of Proteome Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Comparison What kind of comparison should be done? Is it a group comparison with predetermined groups? Is it an explorative study to find inborn and unknown groupings? Is it a time series study? Regardless, it is important to have a plan for the final statistical data analysis. The statistical model will have different requirements in terms of number of samples, number of replicates, number of time points, normality of data etc. What are the possibilities and limitations of the method? Can we detect only linear relationships or is there a risk of over-fitting with the chosen method? Maybe a new methodology has to be developed to find what we are looking for. There are numerous of statistical methods available. How to choose the right one is not easy and will differ from study to study. Nevertheless, it is of great importance to have the data preprocessing and the statistical analysis procedure outlined beforehand to ensure getting the right data to analyze from your experiments.
Are we going to do a lot of statistical tests? Maybe thousands of t-tests, one for each detected protein. Then we need to decide upon a method for correction for multiple testing: E.g. Bonferroni, which is a conservative method with strong control that is suitable if few tests (≤ 100) are performed. In most cases, the calculation of FDR (False Discovery Rate) by Benjamini and Hochberg is more suitable and less conservative.24 There are further methods including the significance of the ingoing measurements, as for RNA-seq data where the number of readings are included as a measure of the reliability of the comparative statistics.25
12 ACS Paragon Plus Environment
Page 12 of 28
Page 13 of 28
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
Regardless if doing e.g. a t-test or a multivariate supervised model (eg PLS), your statistical model should be validated with an external test-set with samples from the same sample population. The test set samples has to be included in the initial design of experiments to be able to include them in the same batch of sample runs, and avoid analytical bias. If not possible to include a test set because of time or resource limitations, cross validation or bootstrapping7 should be planned for. External test set or cross validation will estimate how valid your results are in a larger population.
Accuracy and precision Typical tasks in a clinical ‘omics study is to find features that are significantly expressed compared to baseline expression or finding features that are significantly differently expressed between different groups of patients. No ‘omics analysis method will however give us the “true” or accurate expression levels. This is due to the variability and inborn errors in the sample selection, sample treatment, analysis techniques, data analysis etc. (Figure 3).
Statistical methods for collecting, summarizing, analyzing and draw conclusions from the observed data are numerous. Important to remember is that statistics can take variability into account but cannot reduce it, why we have to select and treat samples with maximum possible care. Figure 3 shows examples of additions to variability and bias in a proteomics experiment, which will hence affect the accuracy and precision of our results. The better data quality, the better the data describes the information we search for, the easier it is to extract essential information by statistical analysis. Since variance that affects the precision, and bias that affects the accuracy both are additive, it is most important to find the largest contributors and minimize those.
13 ACS Paragon Plus Environment
Journal of Proteome Research
Sample selection disease class age Patient selection
Mass spectrometry detection linear range Technical variation
Sample collection
ion suppression Quantitative Accuracy and precision
Baseline correction
Homogenization
smoothing Peak detection
Protein concentration determination
Normalization
Adjustment of protein concentration Labelling and sample transfer Isoelectric focusing
Intensity measurements Peak overlap Feature detection
Chromatographic separation univariate Data analysis multivariate
Sample preparation
Figure 3: Examples of factors in a proteomics study that add variability and bias on to protein quantification summarized in a fishbone diagram.
Randomization Randomization prevents bias to be introduced. This holds for both the sample collection (2. Define and select samples) and the experimental runs. There should be a different randomized order in sample preparation, sample analysis, etc. to prevent bias from being introduced during the experimental process, Figure 4. Check beforehand that no correlation exists between the different run orders.
A. Consecutive run order
y (Feature intensity)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 14 of 28
B. Randomized run order
yD8 yD
yC
yD8
yD7
yC4
yD6
yC3
yD5
yC7 yC5 yC yC3
yC2 yC1
Control
yD
Control
yD4
yD6
yC4
yD4
yC3
yD2
yC2
yC1
Disease
C. Paired/blocked design
yD3 yD2 yD1
yC1 Disease
Control
Disease
Figure 4. The importance of randomization and blocking. Figure with inspiration from Oberg and Vitek.12 A. Consecutive acquisition can introduce bias. The significant difference between group means is here a combination of differences between group
14 ACS Paragon Plus Environment
Page 15 of 28
means and run order bias, impossible to sort out with statistical analysis afterwards. B. Complete randomization removes the risk of introducing bias. The variance within each group, yC and yD, is here a combination of the biological difference and of the run-order variation. The group mean difference can then be assumed not to be confounded with run order bias. C. Paired design with a block size of two samples that are run in parallel. This design allows comparison of differences between individuals within the block. No bias from the run order or technical variability is introduced to the results.
Repeated measurements Biochemical measurements always include variability. Replication of measurements can give information on the magnitude and source of variation and uncertainty. This leads to a better estimate of the reliability of the calculated effects, Figure 5.
A. Single measurements
y (feature intensity)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
B. Repeated measurements significant difference
yD yC
yD yC
Control
Disease
C. Repeated measurements no significant difference
yD yC
Control
Disease
Control
Disease
Figure 5: The value of repeated measurements. Figure with inspiration from Oberg and Vitek.12 A. One single measurement for each group. Impossible to determine if the observed difference is true or just normal variation.
15 ACS Paragon Plus Environment
Journal of Proteome Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
B. Repeated measurements showing a small variance and hence significant difference between Control and Disease. Each x represents the feature intensity (yi) from one individual, yC and yD are the averages intensities for the groups. C. A large variance in the feature measurements indicates that the difference in mean intensities between the groups is not significant and have occurred by random chance.
To estimate the repeatability of a sample run, one sample should be run in at least quadruplicates (n ≥ 4) to be able to estimate the variance close to the truth. This is typically done as a pilot study of one or a few (representative) samples in the study. For example, if running the instrumental analysis and the data analysis independently on one sample 5 times, we can estimate the variability that these procedures will add to the end result (Figure 3). In an estimate of the reproducibility also sampling, sample preparation, different lab personnel, different analyzing labs etc. shall be included. Hence, it is important to declare the experimental set-up when reporting repeatability and reproducibility measures. More on pre-analytical variability and how to estimate it is found in Mayer et al (Ch 22).1
Running samples in duplicates does not give an estimate of the variability but can be used as quality control. By comparing sample data from duplicates we can detect outliers. Use mean, both, or one of the measurements in your calculations. If using both measurements as single events in your statistical model, this should be corrected for, to get a correct statistics.
16 ACS Paragon Plus Environment
Page 16 of 28
Page 17 of 28
A simple statistical model for a mixed effects analysis of variance of one class of samples can be described by the following expression: yijkl = ̅ + mij + nijk + oijkl
[Eq. 1]
where yijkl is the observed feature intensity from sample j, consisting of the mean signal ̅ of all measurements in the class i. k is the index of the sample preparation, and l is the measurement. mij is the individual deviation, nijk is the sample preparation deviation and oijkl is the measurement deviation. mij, nijk and oijkl are all independent. Translated into a three dimensional space with the measure of three proteins is shown in Figure 6.
Feature intensity Protein B
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
x x x
x
x x x
x
x
xx x
yijkl
x x x
x
mij _ xi
x x x
x
yP n s it inte
in rote
oijkl nijk
C
Feature intensity Protein A
Figure 6 shows Equation 1 translated into three dimensions. See the text for notations.
Pooling Combining subjects from one group into a single pool undermines the usefulness of an experiment and is not recommended. One single measurement per group (As in Figure 5(a)) makes it impossible to assess the variability and hence get a valid inference of group differences (Figure 5). Creating several pools from a set of 17 ACS Paragon Plus Environment
Journal of Proteome Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
samples and viewing each pool as independent biological replicates must be based on the assumption that the sources of bias are small and can be neglected. Still, feature quantification from pooled samples is prone to multiple sources of bias from sample preparation and analysis. It is also difficult to detect outlying or contaminated samples, which will affect the pool but cannot be detected.12
Sample size Replications of experiments are done on different levels. The first level of replication is when selecting several samples from a patient population to estimate the mean level of a biomarker intended to be general for the whole population (Figure 1). The more samples that are randomly selected from a representative population, the better estimates of the levels and biological variability in that population, i.e. improved generalization and statistics, which lead to conclusions closer to the truth.
So, How many samples/replicates are needed? The answer is always: “It depends”. It depends on the study aim, the variability among samples, the level of generalization that is aimed for and the analytical method’s variability. It is technically possible to calculate a t-test of 3 + 3 samples or doing a multivariate analysis of 5 + 5. This might be enough for technical replicates at some level, but working with clinical samples it will not give generalizable results, sufficient statistical power or significance level. A study with low statistical power (because of low sample size of studies, small effects, or both, Figure 7) has a reduced chance of detecting a true effect, and reduces the probability that statistically significant findings are true effects, even if the required threshold of significance (e.g. p < 0.05) is met. The consequences of low statistical power is overestimation of effect size and low reproducibility of results.6
18 ACS Paragon Plus Environment
Page 18 of 28
Page 19 of 28
A.
B.
Impact of sample size on power H0
Impact of effect size on power
HA
H0
HA INCREASING EFFECT SIZE
INCREASING POWER
INCREASING SAMPLE SIZE
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
n
d
Expression
Expression
Figure 7: The impact of sample size (n) and effect size (d) on the power of a statistical model. H0 (null distribution) and HA (effect) are assumed normally distributed with σ = 1 and the significance level, α = 0.05. Figure with inspiration from Krzywinski and Altman.26 A Increasing number of samples, n, decreases the spread of the distribution of sample averages in proportion to 1/√n, and increases the power (grey area). B Power (grey area) increases with d, making it easier to detect larger effects.
For estimating the required sample size in a comparison study we need to define: •
Data variability: variance (σ2)
•
Effect size/fold change required to define a difference (d)
•
Type I error rate, α = significance level = False Positive Rate (FPR) = p(rejecting H0 | H0 is true)
•
type II error rate, β = 1- power = False Negative Rate (FNR) Power = sensitivity = p(rejecting H0 | H1 is true)
19 ACS Paragon Plus Environment
Journal of Proteome Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 20 of 28
For continuous variables with α = 0.05 (5% significant level) and β = 0.20 (80% power) we can use “Lehr’s equation” or “the rule of 16” to calculate sample size. The sample size per group (ngroup) can then be estimated by:27,28
[Eq. 2]
= 16
For more flexible sample size assessment in ‘omics studies, use for example the MD Anderson sample size calculator29 or WebPower30 where one can fill-in to calculate required sample size. Keep in mind that if multiple tests are done, the significance level (α) has to be adjusted for multiple testing to control the FDR in sample size calculation31 (mentioned above). When having your experimental results you can also calculate the power of your experiment using the same web-resources.
If you have limited resources, combining data by collaboration is a way to increase the total sample size, the power and the reliability of your findings. If your study is underpowered for some reason (e.g. time or economical), make this clear and note the limitations in the interpretation of your results. Further, if the intended analyses produce null findings and you move on to explore your data in other ways, say so. Null findings locked in file drawers bias the literature, whereas exploratory analyses are only useful and valid if you acknowledge the caveats and limitations.6
Blocking Blocking is when grouping experimental units to reduce known sources of variation. For example different stable isotope labeling methods (TMT, iTRAQ, etc.32) in proteomics experiments prevents from introducing bias during sample analysis. If using more than one set of blocks in your experimental you use the same internal standard sample in all blocks to be able to compare the samples between blocks. The 20 ACS Paragon Plus Environment
Page 21 of 28
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
internal standard is preferably a mix of all samples included in the study, to contain all possible detected molecules. If several blocks are used in case/control studies, distribute equal numbers of cases and controls in each block to avoid introducing bias. If the numbers of individuals between case and control differ, I would recommend selecting representative samples to get equal group sizes. Multiple samples from the same individual are preferably placed in different blocks. Additionally, randomize your samples from different clinical groups between different blocks as much as possible to avoid bias.
Checklist for the ‘omics experimental design
Have a detailed plan for the statistical analysis of your resulting data.
Decide a method for correction for multiple testing.
Plan for an external test set- or cross-validation of your statistical test/model.
Have a scheme for your experimental procedure. Sampling order, sample preparation order and run order should be randomized and independent.
Outline the contributions of variability and bias, for example by a fishbonediagram.
Find out the repeatability and the reproducibility of your method. Either by earlier experiments, literature, or plan for new repeated experiments.
Identify the largest contributors to variance and bias, measure and minimize those.
Calculate the required sample size to reach the intended power and significance level.
Make sure you have the right, and enough many samples for your planned statistical analysis.
21 ACS Paragon Plus Environment
Journal of Proteome Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
CONCLUDING REMARKS The clinical biomarker discovery research field is highly translational and requires collaboration between several disciplines such as clinicians, bioinformaticians, analytical chemists and biologists. To improve the chances of discovering accurate biomarkers there are some key issues that has to be thought through. In this tutorial, I have tried to highlight some aspects to improve experimental design in clinical biomarker discovery. In summary: The clinical aim must be carefully determined. Formulate a “molecular hypothesis” and make sure the analysis method is able to quantify the essential molecules at the expected level. The primary objective should be defined considering sampling and performance of the prospective analytical and statistical methods. Can the available methods answer the clinical question? Several levels of validation also have to be outlined; sample quality control, validation of analysis method, data quality control, validation of statistical models, and validation of found biomarker candidates with other molecular analyses. Required samples for all levels of validation have to be considered in the project outlining. By considering the checklists in this Tutorial I hope clinical biomarker discovery research will improve and result in new clinical biomarker development.
AUTHOR INFORMATION Corresponding Author *Phone: +46 703 50 54 68 e-mail: jenny.forshed@ki.se
Funding sources This work was performed with a grant from VINNOVA (The Swedish Governmental
22 ACS Paragon Plus Environment
Page 22 of 28
Page 23 of 28
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
Agency for Innovation Systems), 2011-01332.
The author declares no competing financial interest.
ACKNOWLEDGEMENTS John Lövrot and AnnSofi Sandberg at Karolinska Institutet are kindly acknowledged for valuable suggestions and proof reading. Anonymous referee are acknowledged for valuable comments on the paper.
REFERENCES (1)
Mayer, B. Bioinformatics for omics data: methods and protocols; Walker, J. M., Ed.; Humana Press, 2011.
(2)
Sallam, R. M. A. B. C. Proteomics in cancer biomarkers discovery: Challenges and applications. Disease Markers 2015, No. 321370, 1–12.
(3)
Pepe, M. S.; Feng, Z.; Janes, H.; Bossuyt, P. M.; Potter, J. D. Pivotal evaluation of the accuracy of a biomarker used for classification or prediction: standards for study design. JNCI Journal of the National Cancer Institute 2008, 100 (20), 1432–1438.
(4)
Ioannidis, J. P. A. Biomarker failures. Clinical Chemistry 2013, 59 (1), 202– 204.
(5)
Tan, H. T. A.; Lee, Y. H. B.; Chung, M. C. M. A. C. Cancer proteomics. Mass Spectrom. Rev. 2012, 31 (5), 583–605.
(6)
Button, K. S.; Ioannidis, J. P. A.; Mokrysz, C.; Nosek, B. A.; Flint, J.; Robinson, E. S. J.; Munafò, M. R. Power failure: why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci. 2013, 14 (5), 23 ACS Paragon Plus Environment
Journal of Proteome Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
365–376. (7)
McDermott, J. E.; Wang, J.; Mitchell, H.; Webb-Robertson, B.-J.; Hafen, R.; Ramey, J.; Rodland, K. D. Challenges in Biomarker Discovery: Combining Expert Insights with Statistical Analysis of Complex Omics Data. Expert Opin Med Diagn 2013, 7 (1), 37–51.
(8)
Drucker, E.; Krapfenbauer, K. Pitfalls and limitations in translation from biomarker discovery to clinical utility in predictive and personalised medicine. EPMA J 2013, 4 (7), 1–10.
(9)
Ioannidis, J. P. A. Why Most Published Research Findings Are False. Plos Med 2005, 2 (8), e124.
(10)
Kaelin, W. G. Publish houses of brick, not mansions of straw. Nature 2017, 545 (7655), 387.
(11)
McDowell, I. Study designs. Retrieved from http://www.med.uottawa.ca/SIM/data/Study_Designs_e.htm.
(12)
Oberg, A. L.; Vitek, O. Statistical design of quantitative mass spectrometrybased proteomic experiments. Journal of Proteome Research 2009, 8 (5), 2144–2156.
(13)
Hulley, S. B.; Cummings, S. R.; Browner, W. S.; Grady, D.; Newman, T. B. Designing clinical research, 4 ed.; Philadelphia : Wolters Kluwer/Lippincott Williams & Wilkins: Philadelphia, 2013.
(14)
Simmons, J. P.; Nelson, L. D.; Simonsohn, U. False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant. Psychological Science 2011, 22 (11), 1359–1366.
(15)
Betsou, F.; Gunter, E.; Clements, J.; DeSouza, Y.; Goddard, K. A. B.; Guadagni, F.; Yan, W.; Skubitz, A.; Somiari, S.; Yeadon, T.; et al.
24 ACS Paragon Plus Environment
Page 24 of 28
Page 25 of 28
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
Identification of evidence-based biospecimen quality-control tools: a report of the International Society for Biological and Environmental Repositories (ISBER) Biospecimen Science Working Group. J Mol Diagn 2013, 15 (1), 3– 16. (16)
Hsu, C.-Y.; Ballard, S.; Batlle, D.; Bonventre, J. V.; Böttinger, E. P.; Feldman, H. I.; Klein, J. B.; Coresh, J.; Eckfeldt, J. H.; Inker, L. A.; et al. Cross-Disciplinary Biomarkers Research: Lessons Learned by the CKD Biomarkers Consortium. Clinical Journal of the American Society of Nephrology 2015, 10, 894–902.
(17)
Kanagasabapathy, A. S.; Kumari, S. Guidelines on standard operating procedures for clinical chemistry. World Health Organization 2000, Available from: http://apps.searo.who.int/PDS_DOCS/B0218.pdf?ua=1.
(18)
Patel, A. A.; Gilbertson, J. R.; Showe, L. C.; London, J. W.; Ross, E.; Ochs, M. F.; Carver, J.; Lazarus, A.; Parwani, A. V.; Dhir, R.; et al. A novel crossdisciplinary multi-institute approach to translational cancer research: lessons learned from Pennsylvania Cancer Alliance Bioinformatics Consortium (PCABC). Cancer Inform 2007, 3, 255–274.
(19)
Richiardi, L.; Bellocco, R.; Zugna, D. Mediation analysis in epidemiology: methods, interpretation and bias. International Journal of Epidemiology 2013, 42 (5), 1511–1519.
(20)
Magnusson, B.; Ornemark, U. Eurachem Guide: The Fitness for Purpose of Analytical Methods. eurachem.org 2014, Available from www.eurachem.org.
(21)
Fisher, R. A. The arrangement of field experiments. J. Min. Agric. G. Br. 1926, No. 33, 503–513.
(22)
Fisher, R. A. The design of experiments; Oliver and Boyd, 1935.
25 ACS Paragon Plus Environment
Journal of Proteome Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
(23)
Wikipedia. Design of experiments. Retrieved from https://en.wikipedia.org/wiki/Design_of_experiments.
(24)
Benjamini, Y.; Hochberg, Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J. R. Statist. Soc. B 1995, No. 1, 289–300.
(25)
Love, M. I.; Huber, W.; Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 2014, 15 (12), 550.
(26)
Krzywinski, M.; Altman, N. Points of significance: Power and sample size. Nature methods 2013, 10 (12), 1139–1140.
(27)
Campbell, M. J.; Swinscow, T. D. V. Statistics at Square One, 11 ed.; WileyBlackwell: Hoboken, 2009.
(28)
DiMaggio, C. Basis of Sample Size Calculations. Retrieved from: http://www.columbia.edu/~cjd11/charles_dimaggio/DIRE/styled-4/code12/#fnref2.
(29)
Coombes, K. R. Sample Size for Microarray Experiments. Retrieved from http://bioinformaticsmdandersonorg/main/SampleSizes:Overview.
(30)
Zhang, Z.; Yuan, K.-H. WebPower: Statistical analysis online. Retrieved from http://webpowerpsychstatorg.
(31)
Cairns, D. A.; Barrett, J. H.; Billingham, L. J.; Stanley, A. J.; Xinarianos, G.; Field, J. K.; Johnson, P. J.; Selby, P. J.; Banks, R. E. Sample size determination in clinical proteomic profiling experiments using mass spectrometry for class comparison. Proteomics 2009, 9 (1), 74–86.
(32)
Elliott, M. H.; Smith, D. S.; Parker, C. E.; Borchers, C. Current trends in quantitative proteomics. J. Mass Spectrom. 2009, 44, 1637–1660.
26 ACS Paragon Plus Environment
Page 26 of 28
Page 27 of 28
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
FOR TABLE OF CONTENTS ONLY
27 ACS Paragon Plus Environment
Journal of Proteome Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
For Table of Contents only 82x39mm (300 x 300 DPI)
ACS Paragon Plus Environment
Page 28 of 28