Multidimensional Normalization to Minimize Plate ... - ACS Publications

Aug 29, 2016 - Multidimensional Normalization to Minimize Plate Effects of. Suspension Bead Array Data. Mun-Gwan Hong,. †. Woojoo Lee,. ‡. Peter N...
0 downloads 9 Views 1MB Size
Subscriber access provided by CORNELL UNIVERSITY LIBRARY

Article

Multi-dimensional normalization to minimize plate effects of suspension bead array data Mun-Gwan Hong, Woojoo Lee, Peter Nilsson, Yudi Pawitan, and Jochen M Schwenk J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.5b01131 • Publication Date (Web): 29 Aug 2016 Downloaded from http://pubs.acs.org on September 1, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Multi-dimensional normalization to minimize plate effects of suspension bead array data Mun-Gwan Hong,† Woojoo Lee,‡ Peter Nilsson,† Yudi Pawitan,¶ and Jochen M. Schwenk∗,† †Affinity Proteomics, Science for Life Laboratory, School of Biotechnology, KTH - Royal Institute of Technology, Solna, Sweden ‡Department of Statistics, Inha University, Incheon, Korea ¶Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden E-mail: [email protected] Phone: +46 (0)8 5248 1482

1

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Abstract Enhanced by growing number of biobanks, biomarker studies can now be performed with reasonable statistical power by using large sets of samples. Antibody-based proteomics by means of suspension bead arrays offers one attractive approach to analyze serum, plasma or CSF samples for such studies in microtiter plates. To expand measurements beyond single batches, with either 96 or 384 samples per plate, suitable normalization methods are required to minimize the variation between plates. Here we propose two normalization approaches utilizing MA-coordinates. The multi-dimensional MA (multi-MA) and MA-loess, both consider all samples of a microtiter plate per suspension bead array assay, thus does not require any external reference samples. We demonstrate the performance of the two MA normalization methods with data obtained from the analysis of 384 samples including both, serum and plasma. Samples were randomized across 96-well sample plates, processed and analyzed in assay plates respectively. Using principal component analysis (PCA), we could show that plate-wise clusters found in first two components were eliminated by multi-MA normalization as compared to other normalization methods. Further to this, we studied the correlation profiles between random pairs of antibodies and found that both MA normalization methods substantially reduced the inflated correlation introduced by plate effects. Normalization approaches using multi-MA and MA-loess minimized batch effects arising from the analysis of several assay plates with antibody suspension bead arrays. In a simulated biomarker study, multi-MA restored associations that were lost due to plate effects. Our normalization approaches, which are available as R package MDimNormn, could also be useful in studies using other types of high throughput assay data.

Keywords normalization, plate effect, affinity proteomics, multiplexed immunoassays

2

ACS Paragon Plus Environment

Page 2 of 28

Page 3 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Introduction There have been a lot of efforts put into finding associations between the abundance of biomolecules in human blood and phenotypes such as age and disease status at a genome-wide scale. 1,2 Among the different approaches, proteomics methods can be used to investigate large numbers of proteins by mass spectrometry (MS), while one benefit of assays using affinity reagents is to analyze many samples at a time. One of the leading affinity-based methods for the exploratory analyses of proteins in fluid samples is the antibody suspension bead array (SBA), in which proteins are captured from biological samples, such as serum or plasma, via specific antibodies immobilized to color-coded beads. 3 This method allows to measure up to 500 proteins per bead array using microtiter plates with 96 or 384 wells per analysis. With a growing need of measuring more and more samples to boost the power to detect the biologically relevant associations, 4 the number of statistically required samples may however exceed the number of positions available on one microtiter plate of a single analysis. In such cases, non-biological variation between plates are likely to occur and in order to minimize these, the samples have to be allocated to random positions across multiple plates while attaining the homogeneity of variables such as age, gender, diagnosis, or sample collection dates across the plates. The challenge arising from splitting the analysis of samples into several sets of data batches is common for many types of assays, and it has been addressed in particular highdimensional data, such as expression analysis using oligonucleotide microarrays. Compared with the dimensions of microtiter plate based SBA method (384 features × 96 or 384 samples), oligonucleotide arrays enabled the analysis of thousands of features per slide while analyzing much smaller number of samples per slide batch . Thus the majority of normalization methods previously developed for such type of expression data 5 and per se seem less applicable for the bead array data. Nevertheless, the genomics field has described several relevant strategies, of which we have considered methods utilizing MA coordinates. For these coordinates A symbolizes the average intensity levels across all batches (plate, slide, or lot 3

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

identity) and the M s refer to the differences between separate batches. This can be followed by loess smoothing 6,7 to account for trends within the intensity distribution. In proteomics, several normalization methods have also been developed and mainly applied to data from MS-based assays. There, the primary concerns for technical variations are the thousands of ions detected from individual samples rather than between several data batches derived from plates filled with a large number of samples. Even though normalization methods for MSbased proteomics data seem to be sub-optimal for the challenges faced by SBA assays, we compared popular normalization tools to the described multi-MA and MA-loess approaches. We developed a method suitable for the dimensions and characteristics of the SBA data with the aim to utilize larger numbers of samples in affinity proteomic assays. Making use of the MA coordinates that focus on the deviation, we developed normalization approaches by revising the existing MA coordinates approaches to control representative value of all samples in a plate instead of choosing individual reference samples. The performance of our approach was assessed using PCA and correlation analysis and compared to other methods, such as LOESS and Quantile normalization, offered by the massive normalization tool "Normalyzer". 8 In the presented study, we assessed the performance of the proposed MA normalization methods using data from antibody bead arrays and demonstrated that such data fits well to the assumption for the normalization approaches.

Methods Data acquisition We chose unprocessed data from an in-house proteomic study to demonstrate the normalization approaches, in which a total of 372 human serum and plasma samples were analyzed using antibody SBAs. 3 We used 382 antibodies due to their availability and without considering any disease relation or if several antibody targets would biologically be related. The workflow is described in Figure S1, and includes two types of plates. Plates denoted as 4

ACS Paragon Plus Environment

Page 4 of 28

Page 5 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

source plates refer to containers in which samples are distributed into for the preparation of the samples for the assays. The sera and plasmas in each source plate were subjected to 1:10 dilution in Phosphate-buffered saline (PBS), biotinylation and quenching of the labelling reaction. Samples were then frozen and stored at -20◦ C. Prior analysis, samples were thawed and diluted 1:50 in a protein containing assay buffer and heat treatment for 30 min at 56◦ C was applied. The samples from one 96-well source plate were then transferred into one 96-well assay plate and combined with antibody coupled beads. The assay protocol, in short, included an overnight incubation, washing off unbound proteins, and the detection of the captured biotinyltated proteins via fluoreslencty labelled streptavidin. Subsequently, the assay plate is inserted into the instrument (FlexMap3D, Luminex Corp) to acquire the data. Each data point is build on the median fluorescence intensity (MFI) from at least 32 beads per color-code. The MFI values report the interaction of antibodies bound to the beads with protein found in solution.

Investigation of Homogeneity In our study, we analyzed 156 serum samples from TwinGene cohort 9 and 204 plasma and 12 serum samples obtained from participants of the LifeGene study 10 using 384 antibodies. The samples from both study sets were distributed randomly across four 96-well source plates and these plates were subsequently processed and analyzed in parallel as separate 96-well assay plate (Figure S2a). To examine batch effects derived from the assay plates, meaning to demonstrate the homogeneity across source plates, 24 samples from three columns of each of the four source plates were combined into one 96-well assay plate (Figure S2b). To confirm the observed outcome, this analysis was repeated on different days. The quantifying method and beads were the same for both experiments.

5

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Normalization approaches By randomizing samples across plates, the homogeneity of the samples can be kept over the plates. This plate design allows us to expect that the distribution of the measures of one target protein is invariant across all the plates. This imposes an approximately constant mean for every source plate. Thus, the divergence of the mean values represents the technical effect of the assay plates on the measures. The data can thus be normalized by adjusting the intensity distributions between different assay plates to have the same representative values. MA normalization approach We selected the MA coordinates, which are derived from the intensity values of the antibodies used in the assay and per plate. In detail, the coordinates are defined by a rotation in the n-dimensional Euclidean coordinate system for n number of plates. The jth coordinate provides representative intensity values of an antibody for the jth plate, e.g. using mean or geometric mean. By rotating the original coordinate system, the A coordinate axis is chosen as the axis along the line of identity (like the dotted lines in Figure 1). The value on the A axis represents the average of the mean intensity levels across all assay plates for each antibody. The M coordinate axes (M1 , M2 , . . . and Mn−1 ) are chosen to be perpendicular to each other and each of them is also perpendicular to the A coordinate axis. The M coordinates refer to the deviation from the A-coordinate axis. They contains the collective information of how much divergent the representative values of the plates are. An example is given for 3 plates (n = 3): A coordinate axis passes through (1, 1, 1) and the origin (0, 0, 0) in the coordinate √  √   system before the rotation. M1 and M2 axes pass through −1, 1 + 3 /2, 1 − 3 /2 √  √   and −1, 1 − 3 /2, 1 + 3 /2 , respectively. The proposed normalization procedure follows the steps described below: 1. Assess the distribution of the data from each assay plate and optionally log-transform all measures (e.g. if the data have nearly log-normal distribution)

6

ACS Paragon Plus Environment

Page 6 of 28

Page 7 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

2. Compute a representative value (e.g. average) for each antibody/protein and for each assay plate 3. For each antibody/protein, locate the respective intensity values on the coordinate system where one axis corresponds to one assay plate 4. Rotate the coordinates to the MA composed of A and M1 , M2 , . . . , Mn−1 (as defined above) 5. Determine the distance between the A and the intensity values of each antibody (MultiMA) or the local average intensity values of neighbor antibodies (MA-loess) along each of the M1 , M2 , . . . , Mn−1 6. Shift the observed individual values by the distance to the A for each antibody/protein The equations to the normalization follow: Let xijk be the observed intensity value of the sample k measured on the j th assay plate for the target i. Assuming the plate effects are multiplicative or additive factors, the xijk is decomposed, with or without log-transformation respectively, as

xijk = x′ijk + pij + εijk

(1)

where x′ijk is the intensity value free from any plate effect, pij the plate effect, and εijk the error term. 11 Here, we pursue with pij being the representative value of normalized data, xijk , i.e.

x′ijk + εijk = xijk − pij

(2)

is constant across plates with or without a margin. The approach without margin is denoted as Multi-MA, while for MA-loess the margin is the difference from loess smoothing line that will be introduced at the last step. 7,12

7

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 28

Choosing mean as the representative value per plate, the means of the samples in the jth plate for all m targets can be expressed in a row vector,

xj = (¯ x1j x¯2j . . . x¯mj ) , where x¯ij =

P

lj k=1

(3)

 xijk /lj and lj is the number of samples on the jth plate. Note

that xj is not the vector for each antibody in the coordinate system described in the step 3 above. Each antibody vector is a column vector of X in the equation 5 below. The rotation to the new coordinates is performed, as described below by the matrix R, by the transposed orthogonal matrix of the left singular vectors of the n × 1 matrix with ones, keeping the 1

first row of R as (1 1 1 . . . 1) · n− 2 . For a given R and xj s, we compute the positions in M A coordinates as 

a





x1



         m1   x2           m  = R · x   2   3      ..   ..   .  .     mn−1 xn

(4)

in which a = (a1 a2 . . . am ) and ai is the value on A coordinate for the ith target. Likewise mj ′ is for the Mj ′ coordinate. By simplifying the equation with M = aT mT1 mT2 . . . mTn−1 T and X = xT1 xT2 xT3 . . . xTn , (5)

M = R · X

n×m

n×n

n×m

The deviation from A is Md = 0T mT1 mT2 . . . mTn−1

T

, where 0 = (0 0 . . . 0) of length T m. The MA , matrix after the adjustment onto A, is M − Md = aT 0T . . . 0T . For given X, R, and Md , we get XA , the matrix after normalization in original coordinates.

8

ACS Paragon Plus Environment

T

Page 9 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

MA = M − Md

(6)

R · X A = R · X − Md

(7)

XA = X − R−1 · Md

(8)

where R−1 is the inverse matrix of R. Since R is orthogonal,

XA = R−1 · MA = RT · MA

(9)

and 







 x1,A  1/ n r21     x2,A  1/√n r22    XA =  .  =  . ..  ..   .. .       √ xn,A 1/ n r2n

... ... .. . ...

     √ rn1  a a/ n     √      rn2   0 a/ n · =     ..   ..   ..  .  .  .        √ rnn 0 a/ n

(10)

in which rab is the element of R at the row a and column b. The xj,A , the vector after normalization of xj , satisfies, √ xj,A = a/ n = constant for all j (plates)

(11)

The adjustment matrix Xadj is acquired as

Xadj = XA − X = −R−1 · Md or

9

ACS Paragon Plus Environment

(12)

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60



Xadj





x x˜  1,adj   11     x2,adj   x˜12         =  x3,adj  =  x˜13     ..   ..  .   .    xn,adj x˜1n

x˜21 . . . x˜m1 x˜22 . . . x˜23 . . . .. . . . . x˜2n . . .

Page 10 of 28



  x˜m2    −1 x˜m3   = −R · Md  ..  .   x˜mn

The x˜ij is chosen as the normalization factor for all samples in the jth plate and the ith target, which are xij1 , xij2 , . . ., xijk . The individual value after normalization xijk,normalized is computed as below.

xijk,normalized = xijk + x˜ij

(13)

The deviation matrix Md can be replaced with a more general form

 ˆ d = 0T m ˆ T1 m ˆ T2 . . . m ˆ Tn−1 , M

(14)

ˆ j ′ is the row vector of the fitted values to a function. Modeling where m

mj ′ = f (a) + e

(15)

and e is residual vector, for given mj ′ and a,

mˆj ′ = f (a) (j ′ = 1, 2, . . . n − 1)

(16)

For the f , loess smoothing has been employed using the function "loess" in R package, as an alternative that may mildly affect the data. 12

10

ACS Paragon Plus Environment

Page 11 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Other normalization methods In addition to the described approach, the massive normalization tool "Normalyzer" (version 1.1.1) was used to compare the performance of 8 popular normalization methods in proteomics. 8 This tool requires replicated analysis of samples to evaluate each method, thus data of the single assay plate experiment was combined with the data from the four assay plates. Simulation of biomarker study A set of random samples (N=200 individuals) was generated from the standard normal distribution using the function "rnorm" in R package. The individuals were randomly divided into two groups of equal size, cases and controls. The value for the controls was increased by 0.5 to introduce a difference. The cases and controls were then split into 4 groups of equal size (N=50) and distributed over 4 plates (data 1). To introduce a plate effect, four random numbers taken from normal distribution (standard deviation = 2) were individually extracted and added to the values of each plate (data 2). The data with the introduced plate effects was normalized then by Multi-MA (data 3). T-test was used to assess the association between the case-control status and the values. This exercise was repeated 100 times.

Software availability The function for the normalizations in R environment 13 is available in the package MDimNormn, and can be accessed on the comprehensive R archive network (CRAN). Different representative values, such as median, can be chosen as a parameter for the R function. The ˆd in the equation 16, and package is also flexible in terms of the fitting function f for M allows the user to choose an appropriate f for the experimental set-up. The function f can be "loess", "robust linear regression(rlm)", "weighted linear regression(lm)", and others.

11

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Results We describe a method for normalizing batch effects that are introduced during affinity proteomics analysis when utilizing multiple assay plates. This normalization method was developed based on the assumption of sample homogeneity when samples are spread randomly across multiple plates. This takes into account that the design of the plate content is balanced for different features related to the subject (e.g. age, gender, disease status) as well as to the sample specimen (e.g. preparation type, date of collection, storage time, number of freeze-thaw cycles). The data to be normalized was acquired from the multiplexed immunoassay procedure that had been conducted with four separate 96-well source and assay plates containing 372 randomly distributed samples. Evaluation of the source of batch effects To ensure the reasonability of the homogeneity assumption, an experiment demonstrating the plate effect was conducted. For this purpose, an additional data set was generated by combing 24 samples from each of the four source plates into one 96-well assay plate. We expected little or no effects from source plates, and in order to affirm such observations with the assembled sample set, the analysis was performed twice (denoted day 1 and day 2). This data was then compared with intensity values obtained for the same 24 samples from the same source plates that were analyzed on separate assay plates. The latter was expected to encounter additional technical variations. The data obtained from these experiments is represented in Figure 1 and indicates that the profiles generated on separate assay plates were deviating from those generated on a single assay plate. Substantial plate effects on the measures in separate assay plates were observed, and intensity values in plate 4 were notably lower than those in the other plates. Interestingly, the data influenced by plate variation scattered around an area nearly parallel to the line of identity. Contrastingly, the data from the single assay plates experiment revealed high similarity between the source plates. Most of the aggregated values are located around the identity line. The distances (or deviation) 12

ACS Paragon Plus Environment

Page 12 of 28

Page 13 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

from the line were negligible, substantially smaller than those for the data of separate plates. Considering the distribution of intensity values, which roughly follows log-normal (Figure S3), we used geometric mean of the intensities as the representative value for a target within a plate in this analysis. When applying principal component analysis (PCA), the homogeneity was further examined. In the PCA analysis in Figure 2, features of primary distinction were two groups of data separated by the second principal component (PC2). The separation was due to the two different types of samples (plasma and serum; see Figure S4). Dissected further for source plates, as shown in the boxplots representing the variance of the first principal component (PC1), the batch effects originated from analyzing samples on separate assay plates as compared to much smaller effects found for replicated analysis of a single assay plate. This confirmed that the assumption of the homogeneity across plates is plausible in experimental designs that require multiple plates. Since most of the effects disappeared in experiments with single assay plate measurements, this suggests that the difference between plates was originated not from the difference of biological samples, but from the technical or preparatory variables. Assessment of normalization methods using PCA analysis In the previous analysis using 24 serum and plasma samples from four assay plates each, we demonstrated the acceptability of the homogeneity assumption. Next, we expanded the analysis to all samples of the four source plates and investigated the plate effects for serum and plasma separately. The same analysis was performed on both sample types, and the outcome of the serum analysis is presented in Figure S5). The primary objective of the normalizations introduced here is limited to minimize the effect of batches, in our case microtiter plates. Additional tools are thus required to incorporate for other non-biological variation. For the data used here, preprocessing was conducted in 4 steps. 1. The data of failed samples. e.g. with MFI levels at background, were removed. 13

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

2. The unprocessed data was divided by assay plate and blood preparation types (plasma and serum). 3. For each subset of data, robust PCA 14 was employed to identify sample outliers. 4. The sample-by-sample variation was adjusted by using probabilistic quotient normalization (PQN). 15 In the current data set, two and three samples were removed at the step 1) and 3), respectively. Other normalization approaches for proteomic data were tested using the "Normalyzer" software. "LOESS" and "Quantile" normalizations 5 showed the lowest variation between measurements from 4-plates and from single plate after normalization (Figure S6). Starting from unprocessed data, the 204 plasma samples (Figure 3A) revealed four distinct clusters related to the assay plates. For plasma, PC1 explained more than 60% of variances and the dissimilarity between assay plates was further clarified in the box plot of PC1. These observations agreed with the previous scatter plots in Figure 1. This became even more clearly separated by outlier removal and PQN per plate (Figure S7). Since only 1 outlier was detected in 204 plasma samples, it is likely that the QC step had little effect on the data, while the PQN made the data of samples in each plate similar to each other intensifying the contrast between plates. The separation in the PC plots was diminished completely by Multi-MA or almost by MA-loess normalization (Figure 3B-C). This reflects that the data from four plates became more homogeneous. Since PQN could either be used right after the QC step for each assay plate separately or applying PQN on all assay plates combined was tested. As shown in Figure 3D, the plate-wise heterogeneity was still detectable. When testing LOESS and Quantile normalizations, as proposed by the Normalyzer software, neither of the two methods could remove the plate effects (Figure 3E-G). To summarize the differences in performance of the above mentioned normalization methods, Kruskal-Wallis tests were applied to compare PC1 and PC2 between the different assay plates. As shown in Table 1 for plasma and Table 2 for serum, multi-MA outperformed 14

ACS Paragon Plus Environment

Page 14 of 28

Page 15 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

all the other tested methods. MA-loess was the second best approach when using the two primary PCA components of variation as measures for the batch effect assessment. Assessment of MA normalization methods by correlation analysis In addition to PCA, the performance of normalization was assessed by a correlation test. This test was designed to detect hidden variability across samples that may originate from experimental factors such as plate batches. 16 Out of 382 antibodies (Ab), random pairs were assembled. Again, the data for the two blood preparation types was separated and following log-transformation, Pearson’s correlations were computed. This procedure was repeated 500 times for each of the serum and plasma data sets. The correlation distribution was compared as shown in Figure 4. Both the unprocessed data and the data normalized by PQN per plate revealed high correlations between random pairs of Abs and decreasing trends as the product of standard deviation increases, which suggest the dominant effect of the technical or preparatory variable, plate, on the data. The effect is more obvious in the scatter plots of individual pairs as shown in Figure S8. After normalization by Multi-MA or MA-loess, such correlation disappeared, supporting the conclusions made from the PCA analysis. Effect of MA normalization for plate effect in biomarker studies Data normalization is an intermediate step of biomarker studies. We aimed to investigate the effect of MA normalization on such results using association tests. During study design, the samples are randomly allocated into multiple plates and the number of samples in each plate is large enough to simulate a two group comparison. We artificially introduced a plate effect that increased the variation within each group across all plates. Thus, plate effects can substantially lower the power to detect an association. The simulated data was generated for 100 cases and 100 controls on 4 plates having the effect size of 0.5 (data 1). Let the data have plate effects (data 2) by altering the data for 15

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

each plate individually. The data was then normalized by multi-MA method (data 3) and compared to data sets 1 and 2. The 5 examples of the simulations are shown in Figure S9. In 100 repeated simulations, the true association was revealed 53 times from the data without plate effects (data 1), 11 times from the data with plate effect but before normalization (data 2), whereas the association was revealed 52 times from the data with plate effects that had been normalized by multi-MA (data 3). There was significant drop of the power to detect an association in unprocessed data with plate effects, but as shown in our simulations, multi-MA restore the statistical power.

Discussion Here we present two MA-based normalization methods to address differences between batches of data introduced by analyzing several assay plates using suspension bead arrays. The methods were developed to enable the analysis of larger sets of samples that require multiple batches. As compared to unprocessed raw data, batch effects were substantially reduced by the MA methods. This was also shown to be beneficial for a simulated biomarker study. Compared to other popular normalization tools, the developed Multi-MA method performed better with the investigated data set in terms of homogeneity across plates after normalization, as shown in the PCA analysis. We suggest MA-loess as an alternative normalization method to Multi-MA, if the data that would be less affected by plate effects. Both tools circumvent the need to allocate position in the exploratory study design to standard or reference samples, which themselves may suffer from variations. By using a larger number of samples instead, the vulnerability to stochastic variation is small and the number of references may also be reduced. The progressive changes of one example data set obtained by same analysis tool for which MA-loess is suitable are shown in Figure S10. The usage of these normalization tools is not limited to affinity-based plasma proteomics data. However, the following requirements should be considered: First, the number of sam-

16

ACS Paragon Plus Environment

Page 16 of 28

Page 17 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

ples to group for a technical variable, such as sample and assay plate should be large. This is due to the normalization factor, which is determined on the basis of the representative values (e.g. mean and geometric mean) for every plate. Second, the assumption of homogeneity of samples across plates should be acceptable. The normalizations work by adjusting the design of the sample plates in terms of distribution of variables describing subjects (e.g. age, gender, disease status) and specimen (e.g. sample type, collection date). Third, the variance within each assay plate should be invariable across several assay plates. Heteroscedastic data cannot be normalized by these methods utilizing a single adjusting factor per target. Fourth, it should be reasonable that normalization can also be achieved by an addition (or subtraction) in linear or log-scale. The two suggested methods are related to previous efforts. The approach with MA-loess transformation has similarity with the method described by Åstrand, 7 and Multi-MA with the batch mean-centering. 11 The Åstrand’s MA approach was developed for non-biological variation across samples in oligonucleotide microarray experiments. Thus the normalization employing a representative value such as the mean of all samples in a plate was not accommodated. Adapting this to the data in affinity-based proteomics, the method was modified. This showed that this adapted version could indeed be used even in such data. The Multi-MA produces the values in reasonable range, which can be further processed for quantification, unlike the batch mean-centering. 11 Further more, the plausibility of the approach for the affinity-based proteomic data set is empirically confirmed here. When compared to other popular normalization methods, the multi-MA approach outperformed such alternatives. Thus, the proposed MA-based normalization approached are attractive alternatives to existing solution and should also be considered for other types of high-throughput data. In addition to the two proposed methods, other regression or smoothing functions that fit better to other data can be incorporated by adjusting the fitting function f in the presented equation 16. In conclusion, it is suggested to test for batch effects in data derived from multiplexed immunoassay and to address the influence from such effects prior to further

17

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

analysis, and normalization methods such as multi-MA or MA loess should be considered.

Conflict of Interest Disclosure The authors declare no competing financial interest.

Acknowledgement The authors thank the Swedish Twin Registry, supported by Swedish Ministry for Higher Education, and LifeGene, supported by AFA Försäkringar and Torsten and Ragnar Söderberg’s Foundation, for the valuable samples. We particularly thank Kimi Drobin for generating the data, everyone of the Biobank Profiling group at SciLifeLab and the entire staff of the Human Protein Atlas for their great efforts. The KTH Center for Applied Proteomics funded by the Erling-Persson Family Foundation, Science for Life Laboratory, the Knut and Alice Wallenberg Foundation, and the ProNova VINN Excellence Centre for Protein Technology (VINNOVA, Swedish Govern- mental Agency for Innovation Systems) are acknowledged for their financial support.

Supporting Information Available Figure S1: Experimental work flow of suspension bead array assays. Figure S2: The experimental design of a single plate analysis to verify the homogeneity assumption when comparing to assays analyzing multiple plates. Figure S3: Distribution of signals derived from suspension bead array assays. Figure S4: Contrasting between plasma and serum samples. Figure S5: Comparison of methods to normalize for batch effects using PCA analysis (serum samples). Figure S6: Replicate variation plots from the reports of the "Normalyzer" tool. 18

ACS Paragon Plus Environment

Page 18 of 28

Page 19 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure S7: PCA plots after outlier removal and applying PQN normalization per plate Figure S8: Scatter plots showing the relation of 10 random pairs of antibodies. Figure S9: Simulations to check the impact of normalization on biomarker studies Figure S10: PCA plots of example data, for which MA-loess was applicable and that showed the progressive changes of the data through each normalization step. This material is available free of charge via the Internet at http://pubs.acs.org/.

References (1) Mitchell, P. S. et al. Circulating microRNAs as stable blood-based markers for cancer detection. Proceedings of the National Academy of Sciences 2008, 105, 10513–10518. (2) Lausted, C.; Lee, I.; Zhou, Y.; Qin, S.; Sung, J.; Price, N. D.; Hood, L.; Wang, K. Systems Approach to Neurodegenerative Disease Biomarker Discovery. Annual Review of Pharmacology and Toxicology 2014, 54, 457–481. (3) Schwenk, J. M.; Gry, M.; Rimini, R.; Uhlén, M.; Nilsson, P. Antibody suspension bead arrays within serum proteomics. Journal of proteome research 2008, 7, 3168–3179. (4) Button, K. S.; Ioannidis, J. P. A.; Mokrysz, C.; Nosek, B. A.; Flint, J.; Robinson, E. S. J.; Munafò, M. R. Power failure: why small sample size undermines the reliability of neuroscience. Nature reviews. Neuroscience 2013, 14, 365–376. (5) Bolstad, B. M.; Irizarry, R. A.; Astrand, M.; Speed, T. P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics (Oxford, England) 2003, 19, 185–193. (6) Li, C.; Tseng, G. C.; Wong, W. H. In Statistical Analysis of Gene Expression Microarray data; Speed, T., Ed.; Chapman & Hall/CRC: Boca Raton, Florida, USA, 2003; Chapter 1, pp 1–34.

19

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(7) Åstrand, M. Contrast normalization of oligonucleotide arrays. Journal of computational biology : a journal of computational molecular cell biology 2003, 10, 95–102. (8) Chawade, A.; Alexandersson, E.; Levander, F. Normalyzer: a tool for rapid evaluation of normalization methods for omics data sets. Journal of proteome research 2014, 13, 3114–3120. (9) Magnusson, P. K. E. et al. The Swedish Twin Registry: establishment of a biobank and other recent developments. Twin research and human genetics : the official journal of the International Society for Twin Studies 2013, 16, 317–329. (10) Almqvist, C. et al. LifeGene–a large prospective population-based study of global relevance. European journal of epidemiology 2011, 26, 67–77. (11) Lazar, C.; Meganck, S.; Taminau, J.; Steenhoff, D.; Coletta, A.; Molter, C.; WeissSolís, D. Y.; Duque, R.; Bersini, H.; Nowé, A. Batch effect removal methods for microarray gene expression data integration: a survey. Briefings in bioinformatics 2013, 14, 469–490. (12) Cleveland, W. S.; Grosse, E.; Shyu, W. M. In Statistical Models in S ; Chambers, J. M., Hastie, T. J., Eds.; Chapman and Hall: New York, 1993; Chapter 8, pp 309–376. (13) R Core Team, R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing: Vienna, Austria, 2015. (14) Hubert, M.; Rousseeuw, P. J.; Branden, K. V. ROBPCA: A New Approach to Robust Principal Component Analysis. Technometrics 2005, 47, 64–79. (15) Dieterle, F.; Ross, A.; Schlotterbeck, G.; Senn, H. Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in 1H NMR metabonomics. Analytical chemistry 2006, 78, 4281–4290.

20

ACS Paragon Plus Environment

Page 20 of 28

Page 21 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

(16) Ploner, A.; Miller, L. D.; Hall, P.; Bergh, J.; Pawitan, Y. Correlation test to assess lowlevel processing of high-density oligonucleotide microarray data. BMC bioinformatics 2005, 6, 80.

21

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure Legends Figure 1. Pairwise comparison of different sample plates. Here are shown the geometric means of intensity values of individual antibodies and of 24 samples originating from four different sample plates. Each plot compares two sample plates. The sample were either analyzed in four separate assay plates (black) or combined in one assay plate analyzed on different dates (green and red). The deviation from the line of identity (dashed line) indicates the effects of the assay plates on the obtained data. Figure 2. PCA analysis of different assay plates. The first two components of the PCA analyses are illustrated for 96 samples (52 plasma and 44 serum) in upper panels, in which the proportion of variance explained by each PC is noted in parentheses. The distributions of PC1 of 4 sample plates (black, red, green, blue) were compared in the box plots in the lower panel. The samples were colored based on their sample plate origin. The presented results derive from data of four separately analyzed assay plates (left) or when samples were combined in one assay plate and analyzed on different dates (center and right). Figure 3. Comparison of methods to normalize for batch effects using PCA analysis. The principal components 1 and 2 (upper panel) and box plots derived from PC1 (middle) and PC2 (bottom) for the different assay plates and normalization methods are shown. The differences in PC1 and PC2 for the applied normalization methods is also summarized in Table 1. This analysis is based on 204 plasma samples. The results for serum analysis can be found in Table 2 and in Figure S5. Figure 4. Effect of normalization methods on correlation of random antibody pairs. Correlations and the product of standard deviations of 500 random pairs of antibodies. The values were obtained from the data of 204 plasma samples from four separate assay plates. To illustrate the trends within each data set, a blue trend line and error bars were added. The latter depicts the 95% confidence interval of mean correlation value in a decile.

22

ACS Paragon Plus Environment

Page 22 of 28

Page 23 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Table Legends Table 1. Comparison of normalization methods using the % of the variance explained by PCA components 1 and 2. For each component, the differences between the four assay plates were calculated using the p-values obtained from Kruskal-Wallis tests. The data was generated from plasma samples. Table 2. Comparison of normalization methods using the % of the variance explained by PCA components 1 and 2. For each component, the differences between the four assay plates were calculated using the p-values obtained from Kruskal-Wallis tests. The data was generated from serum samples.

23

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 28

Tables Table 1: Normalization methods for plate effects (plasma samples) Unprocessed Multi-MA MA-loess PQN LOESS Quantile

PC1 [%] 67.2 20.8 19.9 24.2 16.3 16.5

P-value 1.3 × 10−28 0.99 2.8 × 10−07 9.0 × 10−18 1.6 × 10−17 9.3 × 10−17

PC2 [%] 6.4 8.9 9.1 8.6 10.6 11.4

P-value 3.8 × 10−20 0.97 1.8 × 10−14 2.6 × 10−12 7.4 × 10−24 3.7 × 10−27

Table 2: Normalization methods for plate effects (serum samples) Unprocessed Multi-MA MA-loess PQN LOESS Quantile

PC1 [%] 69.1 14.6 17.2 21.9 18.7 18.2

P-value 1.4 × 10−22 0.99 1.8 × 10−17 3.1 × 10−23 7.4 × 10−23 2.0 × 10−23

24

PC2 [%] 5.5 7.4 7.0 6.7 8.8 8.6

ACS Paragon Plus Environment

P-value 1.4 × 10−18 0.99 1.1 × 10−04 1.5 × 10−03 6.7 × 10−06 1.3 × 10−06

Page 25 of 28

100

200

500

1000

2000

5000

10000

5000 10000 2000 100

200

500

1000

2000

5000

10000

50

500

1000

Plate 3

2000

5000

10000

2000

5000

10000

4 separate plates One plate (day1) One plate (day2)

50

100

200

500

1000

5000 10000

Plate 3

2000 1000

Plate 2

500 200 100 50 200

500

2000

5000 10000

5000 10000 2000 1000 500

100

200

Plate 4

200 50

100

Plate 3

100

Plate 2

1000

Plate 1

500 200 100 50

Plate 2

1000

50

50

100 50

50

100

200

200

500

1000

Plate 1

1000 500

Plate 1

2000

2000

5000 10000

5000 10000

Figures

50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

50

100

200

500

1000

2000

5000

10000

Plate 4

Figure 1

25

ACS Paragon Plus Environment

50

100

200

500

1000

Plate 4

2000

5000

10000

Journal of Proteome Research

One plate (Day 1)

4 Separate plates

One plate (Day 2)



3ODWH 3ODWH 3ODWH 3ODWH

● ●

● ●





●● ●

● ●●

●● ●



● ●









● ●●

● ●●





● ●





● ●●

ï

3&

3&

● ●



● ●

3&

●●

3ODWH 3ODWH 3ODWH 3ODWH

● ●





ï

● ● ●











● ●

ï

ï





●●





●●







ï





ï



ï

ï



ï

ï

ï

ï

ï

ï

ï

ï

ï

ï

ï

ï

ï

3&

3&

3&

3&

ï

3&

3&

Plate

Plate

Plate

Figure 2

15

PC1 (67.2%)

−10

0

10

20



PC1 (20.8%)

−10

0

10

20

●●



−10

0

10

20

10 ●



● ●





●●

● ●

● ●



● ● ● ● ●● ●

●● ● ●











−15 −20

−10

0

10

20

−20

−10

PC1 (16.3%)

0

10

20

PC1 (16.5%)







20

20

20

●● ● ● ●



0

● ●





5

●● ● ●



● ●



PC1 (24.2%)

● ●

10

● ●

● ● ●

10

● ● ●

0

PC1 (16.5%)

PC1 (16.3%)



0

PC1 (24.2%)

0



−10



−10

0

PC1 (19.9%)

10

10



−10

PC1 (20.8%)





20 20







−20

PC1 (19.9%)



0

●● ● ● ● ●● ● ●



● ● ● ●●



●●

● ●

−20

● ●

−5

5



● ●



−20





−10



●● ● ● ●



● ●



●●



0

5



●●



−10





PC1 (67.2%)

● ● ●●

● ●

●●

● ● ●







−15 −20

● ●

−5







PC2 (11.4%)

10 10 ●





−30

● ●



● ●

PC2 (10.6%)







● ●● ●



● ● ●●

10

20



● ● ●



Plate 1 Plate 2 Plate 3 Plate 4 ●



● ●●



0

0

● ●

● ● ●





20

−20





● ●

●● ●

● ●● ● ● ● ● ●● ●

−10

−5







−15

−10



● ● ● ● ● ●● ● ● ●



● ●

● ● ● ●



−15

● ● ●

● ●





−5

● ●



● ●

−40

F. Quantile ●

0

● ● ● ●







−10





PC2 (8.6%)

10 5 ● ● ●



0

● ● ● ●

−5







PC2 (9.1%)

0 −5





E. LOESS

● ● ●

● ●

● ● ●

● ●

● ● ●

● ● ●

15

●●

● ● ● ●

● ●

−10

●● ● ●●



●● ● ● ●

10

● ●



5

5





0

●● ● ● ●

● ● ● ● ●●

PC2 (8.9%)

● ●

D. PQN for all plates

20



PC2 (6.4%)

C. MA−loess ●

● ●

15

B. Multi−MA

10

A. Unprocessed



1

2

3

4

1

2

3

4

1



2

Plate

3

4

1

2

Plate

3

4

−10 1

2

Plate

3

4

1

3

4

3

4

Plate



10

10

1

2

3

Plate



4

1

2

3

Plate

4

0

PC2 (11.4%)

0

1

−10

−15

−10

−15

−10





2

3

4

1

Plate

2

3

4

Plate

Figure 3 26

−15

● ●



−5

PC2 (10.6%)

5

−10

−5

−10

● ●

−10

0

PC2 (8.6%)

10 5

0

−5



−5

−5



PC2 (9.1%)



0

0

PC2 (8.9%)

5

5

5



−5

5

15

10

10

20

● ●

15





PC2 (6.4%)

2

Plate

15

Plate

● ●



−20



−20

−20

−20



−30

−40



10

−10

−20



−15

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 28

ACS Paragon Plus Environment

1

2

3

Plate

4

1

2

Plate

Page 27 of 28

Plasma

1.0

2b. MA−loess normalized

0.5

1.0

2a. Multi−MA normalized

0.5

0.5

0.5

1.0

1. Ourlier removal & PQN per plate

1.0

0. Raw

0.1

0.2

0.3

0.4

0.5

0.1

0.2

0.3

0.4

0.0 −0.5

0.0 −0.5

0.0 −0.5

−0.5

0.0

Correlation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

0.005

0.010

0.020

0.050

Product of Standard deviations

Figure 4

27

ACS Paragon Plus Environment

0.100

0.200

0.01

0.02

0.05

0.10

0.20

for TOC only

10

10

00

00

0

0

20

20

00

00

0

0

Page 28 of 28

1

00 10

10

1000

0 50

A

50 0 00 10

2

00

50

e at

Pl

00

50

0

0

00

00

10

As

10

1 y sa y 2 As ssa y 3 A ssa y 4 A ssa y 5 A sa y 6 As ssa y 7 A ssa y 8 A ssa y 9 0 A ssa y 1 A sa

00 20

00

20000

20

10000

00

5000

10

2000

Plate 2

0

1000

After Normalization

0

2000

A Assay 1 Assay 2 Assay 3 Assay 4 Assay 5 Assay 6 Assay 7 Assay 8 Assay 9 Assay 10 500

00

e 00 20

20 00

Before Normalization

50

5000

45°

50

at

Pl

50

00

M1

00

10000

20000

M1

500

Plate 1

Before Normalization

50

0

00

20

0 00

20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Journal of Proteome Research

ACS Paragon Plus Environment

The workflow of the Multi-Dimensional MA-normalization. The data was transformed to MA-coordinates in which A represents the identity line , and normalized on the coordinates that focus on the deviation from A.