Automatic Identification and Quantification of Extra-Well Fluorescence

Sep 22, 2017 - In recent studies involving NAPPA microarrays, extra-well fluorescence is used as a key measure for identifying disease biomarkers beca...
0 downloads 0 Views 959KB Size
Subscriber access provided by Nanyang Technological Univ

Article

Automatic identification and quantification of extra-well fluorescence in microarray images Robert Rivera, Jie Wang, xiaobo yu, Gokhan Demirkan, Marika Hopper, Xiaofang Bian, Tasnia Tahsin, D. Mitchell Magee, Ji Qiu, Joshua LaBaer, and Garrick Wallstrom J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.7b00267 • Publication Date (Web): 22 Sep 2017 Downloaded from http://pubs.acs.org on September 24, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Automatic identification and quantification of extrawell fluorescence in microarray images Robert Rivera*, §, Jie Wang‡, Xiaobo Yu‡, ∥, Gokhan Demirkan‡, Marika Hopper‡, Xiaofang Bian‡, Tasnia Tahsin§, D. Mitchell Magee‡, Ji Qiu‡, Joshua LaBaer‡, Garrick Wallstrom§, ‡ §

Department of Biomedical Informatics, Arizona State University, 13212 East Shea Boulevard

Scottsdale, AZ 85259, ‡Center for Personalized Diagnostics, Biodesign Institute, Arizona State University, 1001 S McAllister Ave, Tempe, AZ 85281, ∥State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine, National Center for Protein Sciences (The PHOENIX Center, Beijing), Beijing, 102206, P. R. China

ABSTRACT: In recent studies involving NAPPA microarrays, extra-well fluorescence is used as a key measure for identifying disease biomarkers since there is evidence to support that it is better correlated with strong antibody responses than statistical analysis involving intra-spot intensity. Since this feature is not well quantified by traditional image analysis software, identification and quantification of extra-well fluorescence is performed manually, which is both time consuming and highly susceptible to variation between raters. A system that could automate this task efficiently and effectively would greatly improve the process of data acquisition in microarray studies, thereby accelerating the discovery of disease biomarkers. In this study, we experimented with different machine learning methods, as well as novel heuristics, for identifying spots exhibiting extra-well fluorescence (rings) in microarray images, and assigning each ring a grade of 1-5 based on its

ACS Paragon Plus Environment

Journal of Proteome Research Automatic identification and quantification of extra-well fluorescence in microarray images

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

intensity and morphology. The sensitivity of our final system for identifying rings was found to be 72% at 99% specificity and 98% at 92% specificity. Our system performs this task significantly faster than a human, while maintaining high performance, and therefore, represents a valuable tool for microarray image analysis.

KEYWORDS: bioinformatics, image analysis, nucleic acid programmable protein array (NAPPA), biomarker, protein array INTRODUCTION Patients diagnosed in the early stages of a disease often have better prognoses than those diagnosed in the later stages. Therefore, the development of tools and techniques to detect diseases during the early stages is critical for improving health care outcomes of patients. Nucleic Acid Programmable Protein Arrays (NAPPA), a protein microarray platform in which fresh proteins are produced in situ on the arrays, are a means of discovering biomarkers which can be used to identify diseases in their early stages1–10. NAPPA technology has been successfully used to discover biomarkers for diseases such as type 1 diabetes7, tuberculosis10, non-small cell lung cancer5, breast cancer4 and ovarian cancer9. Like other protein microarrays, NAPPA technology is a high throughput platform that can be used for evaluating the level of antibodies detecting certain proteins in an individual’s serum. Increased antibodies are positively correlated with brighter spots on the images of the microarray. Traditional analysis of microarrays involves a comparison of the spot intensity between case and control samples. If statistical analysis determines a significant difference in the spot intensity of cases versus controls (usually with the case spot having higher intensity), then the protein corresponding to that specific spot may be a suitable biomarker.

1

ACS Paragon Plus Environment

Page 2 of 29

Page 3 of 29

Journal of Proteome Research

Automatic identification and quantification of extra-well fluorescence in microarray images

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(b)

(a)

(c)

Figure 1. (a) Comparison of the same NAPPA microarray image under different contrast and brightness settings. Rings are not visible on the unadjusted image (top). (b) Variable size of ring effect depending on brightness and contrast levels. The images focus on the same spot of a single microarray with variable brightness and contrast settings. (c) Variable appearance of two different slides under the same contrast setting. The image on the left appears much cleaner than the image on the right, despite having the same brightness and contrast settings. Both images reflect the same segment of microarray slides. In addition to the signal detected at the center of the printed spot, some responses expand beyond this area, resulting in a form of extra-well fluorescence which we refer to as rings. Rings are fluorescent signals that extend outside the normal range of a spot to create a fading glow around it that usually resembles a ring or a halo (Figure 1). We believe these to be the result of a bleed-over effect in our NAPPA arrays, when protein produced on site is not captured right away. Proteins diffusing outside of the spot form the ring. For example, we have observed that some proteins, such as EBNA and p53, have strong intensities that diffuse outside the region where spot intensity is normally measured, resulting in a clear manifestation of rings4. Increasing serum dilution factors to reduce the appearance 2

ACS Paragon Plus Environment

Journal of Proteome Research Automatic identification and quantification of extra-well fluorescence in microarray images

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

of rings is not feasible since we carefully titrate our serum to get the optimal response rates. If we add less serum, we will miss many of the weaker responses, which may also prove to be of biological significance. A fundamental challenge associated with NAPPA microarray analysis is that in various cases, strong immune responses have been shown to be better correlated with the presence of rings than with intra-well fluorescence intensity, and when present, rings can be used either as the preferred or an additional form of evaluation for biomarkers. In their investigation of the immunogenicity of P. aeruginosa antigens, Montor et al.8 used manual ring analysis of NAPPA protein microarrays, alongside traditional methods, and found ring analysis to have a better correlation with positive responses as confirmed by ELISA. Furthermore, we have relied on ring analysis in multiple studies to successfully discover biomarkers for type 1 diabetes7, non-small cell lung cancer5, and basal-like breast cancer4. In two of our recent studies5,7, the process of manual ring analysis involved visually examining every spot and assigning a score (e.g., 1-5) to each ring. Unfortunately, traditional image analysis software cannot automate the process of identifying and quantifying this form of extra-well fluorescence. The varying size and shape of rings, in addition to their signal similarity to background artifacts such as comets and other causes of increased regional background intensity, provide a significant challenge to classifying them. We attempted to use commercial software, such as GenePix (Molecular Devices) and Array-Pro Analyzer (Media Cybernetics, Inc.) to independently automate this process, but were unsuccessful. The commercial software we tested were unable to identify diverse ring spots and did not allow batch processing of a large number of arrays at the same time. We have no knowledge of any software having the ability to automate the process of distinguishing the various forms of extra-well fluorescence; therefore, we utilized manual methods in each of our past studies involving rings. 3

ACS Paragon Plus Environment

Page 4 of 29

Page 5 of 29

Journal of Proteome Research

Automatic identification and quantification of extra-well fluorescence in microarray images

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

However, manual annotation of rings is a time-consuming process, and prone to inter-rater differences. The majority of rings cannot be visualized without optimizing the image and adjusting its brightness and contrast levels (Figure 1). After performing these preprocessing steps, the user must scan through thousands of spots within each microarray image in order to discover anywhere between 0-120 spots exhibiting the ring effect. Our users estimate that it takes approximately 10-20 minutes per image to manually annotate the rings. In addition to the time it takes to identify rings, there is also subjectivity involved in the ring grading process as the size of the rings fluctuates with the amount of contrast and brightness alteration; Figure 1b demonstrates the variability in ring size of the same spot based on changes in the brightness and contrast. While attempts can be made to standardize this process, certain images and rings will not appear clearly with static adjustments to brightness and contrast. Figure 1c shows two different NAPPA microarray images under the same level of contrast and brightness alteration. While several research studies have focused on developing software for enhancing the automated analysis of microarray images11–13, none of them have been dedicated to analyzing rings. To eliminate the problems associated with manual annotation of the ring effect, we developed software with a primary function of identifying the ring effect on microarray images, and a secondary goal of quantifying the ring size without adjusting the image. Although this software was developed and tested using NAPPA microarray images, it can be applied to other microarray systems in which the ring effect is exhibited, given the proper parameters are provided. Therefore, our software would allow researchers to detect rings and evaluate their value in other protein array platforms as well. In addition, the ring detection approach we describe in this paper incorporates innovative attributes derived through statistical techniques, and unique methods for handling background noise, which could help improve existing methods for analyzing microarray images. 4

ACS Paragon Plus Environment

Journal of Proteome Research Automatic identification and quantification of extra-well fluorescence in microarray images

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

MATERIALS AND METHODS For this experiment, we developed a system which generates a variety of attributes for each spot on a microarray image, and then uses a trained model to classify the spots as either rings or non-rings. The following sections describe the image corpus used for training and testing the attribute set we created, and the details of our classification system. Description of the Image Sets and Annotation of Images A previously annotated corpus of 36 clean NAPPA microarray images was used for the training and testing of our system (see Supplementary Figure S1 for details on clean images). These 36 images were divided into two sets based on the disease they corresponded to. The two sets included a Colon Cancer (CC) image set, and a Tuberculosis (TB) image set. The CC image set consisted of 26 microarray images and the TB image set consisted of 10 microarray images, each set containing a mixture of microarray slides developed either using case or control human serum. Each of the microarrays in both sets contained 2352 spots arranged in 84 rows and 28 columns and was captured using a 635nm wavelength. Two different sets of annotators graded the size of each ring appearing on the images, one group annotating the CC set and the other the TB set. For each image, the annotators provided a grade between one and five for each of the spots exhibiting the ring effect, one representing a small ring effect and five representing a large ring effect. Spots that did not exhibit the ring effect were given a grade of zero. To view the slides, the annotators utilized the software program Array-Pro Analyzer (Media Cybernetics, Inc.) For the TB image set, all spots on each microarray image were assigned a grade; however, for the CC image set, only a subset of spots were given grades.

5

ACS Paragon Plus Environment

Page 6 of 29

Page 7 of 29

Journal of Proteome Research

Automatic identification and quantification of extra-well fluorescence in microarray images

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2. Examples of spots ratings. The spots in this image were taken from a microarray slide annotated by our raters. The spots are arranged in increasing order, moving from right to left, starting with a rating of one. To provide a measure of inter-annotator agreement, both image sets were annotated an additional time. For this annotation, one annotator followed a detailed schema that incorporated instructions for determining the level of brightness and contrast adjustment that should be conducted on a per image basis as well as several templates illustrating the appearances of each grade of the ring effect (Figure 2). This was performed in order to reduce disagreement problems associated with image adjustments, and discrepancy in grading style as the annotation process progressed. To provide a measure of the agreement between our annotator and the annotator groups, we calculated a Kappa statistic for both the CC image set and TB image set annotations. Since the second set of annotations followed welldefined annotation guidelines, it was used as the gold standard for our experiment. Generation of Initial Data using Array-Pro Analyzer (Media Cybernetics, Inc.) The initial data for each microarray image was generated using the Array-Pro Analyzer (Media Cybernetics, Inc.) software (Supplementary Figure S2). The first step in the process involved uploading the microarray images into Array-Pro Analyzer (Media Cybernetics, Inc.) and using its grid generating functionality to produce a grid of circles encapsulating every spot on the image (these circles can be seen in Figure 2). Specifically, we applied the analysis wizard function in Array-Pro Analyzer (Media Cybernetics, Inc.) to automatically generate the grids encapsulating each spot. The 6

ACS Paragon Plus Environment

Journal of Proteome Research Automatic identification and quantification of extra-well fluorescence in microarray images

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

spot boundaries within the grid generated were all equal in size. After generation of the best-fitting grid over the spots, Array-Pro Analyzer (Media Cybernetics, Inc.) provided the mean background intensity, the x and y pixel positions of the center, and the row and column position of each spot (descriptions of each of these attributes are listed in Table 1). Table 1. List and description of utilized attributes generated from Array-Pro Analyzer (Media Cybernetics, Inc.). Attribute Name

Attribute Description

Position

The location of each spot within the grid in a row:column format. For example, for the spot in the 5th row and the 16th column, the position would be listed as 5:16.

Spot Position X

The location of the center of each grid circle measured in number of pixels right of the left edge of the image.

Spot Position Y

The location of the center of each grid circle measured in number of pixels down from the top edge of the image.

Trimmed Mean Background Intensity

The mean intensity of the pixels within a two-pixel radius of the edge of a grid circle. See Figure 4

Attribute Generation The attribute space for our classification system was chosen based on the differences in pixel intensity transition in the regions just outside the grid circles of the spots exhibiting the ring effect versus those not exhibiting the effect. Figure 3 shows changes in pixel intensities of the regions outside individual spots as the pixels move further away from the center. The spot that does not exhibit the ring effect has consistently low pixel intensities in the region outside the grid circle and there does not appear to be any trend in the intensities as the pixel distance from the center of the spot increases. On the other hand, the spot exhibiting the ring effect demonstrates a larger spread of 7

ACS Paragon Plus Environment

Page 8 of 29

Page 9 of 29

Journal of Proteome Research

Automatic identification and quantification of extra-well fluorescence in microarray images

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

intensities and a gradual decreasing trend. We believed that most non-ring and ring spots would be consistent with these results and therefore, by creating an attribute set that would measure the strength and direction of the relationship between these two variables, we would be able to differentiate between spot types. To help provide further distinction between the rings and non-rings, we also included attributes that would measure overall intensity of the area around the spots, as ring spots should have greater pixel intensities in the surrounding areas. Previously, we also tested a model that only utilized the pixel intensity of each pixel in a 65x65 pixel box centered around each microarray spot as its attribute set. However, it was found to have poor positive predictive value and a longer processing time (see Supplementary Table S8) and therefore we did not use it for this study.

Figure 3. Plot of pixel intensities along radial direction for a non-ring and a ring containing spot. The top plot contains intensities for pixels within the spot and the extra-well region, while the bottom focuses on only the extra-well region. Note that the strong, negative relation between the two variables in the ring is absent in the non-ring. 8

ACS Paragon Plus Environment

Journal of Proteome Research

Page 10 of 29

Automatic identification and quantification of extra-well fluorescence in microarray images

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 4. a) Rays used to determine the transition of pixel intensity. In order to measure the pixel intensity transition outside the spot, the pixel intensities along 4 rays were measured for each spot. The points along each ray were determined by moving equal pixels horizontally and vertically. Each pixel intensity was stored with its r-distance. b) Region used to determine the background intensity of a spot. The grey region around the center spot represents the area corresponding to the mean background intensity. c) Depiction of grid used for quantification attributes. 5 attributes were based on a count of the pixels with intensity values greater than the threshold within a 69 pixel x 69 pixel box centered at the center of each spot. To provide measurements of the intensity transition in the region just outside the grid boundary for each spot, we developed an attribute subset consisting of three types of measures: 1) a pixel’s distance from the center of the spot (r-distance, Figure 4), 2) pixel intensity, 3) a statistical computation using these two attributes. R-distance of a pixel is the number of pixels shifted horizontally and vertically from the center of the ring (i.e. if a pixel along a ray is located 20 pixels left and 20 pixels up from the center of the spot, then the r-distance is 20). As the pixels utilized for this method of attribute generation were always shifted equally horizontally and vertically, there were only four possible rays for the pixels to lie on. These directions were chosen since we anticipated that they would not only reflect the trend in the transition of intensity in the regions just outside each spot, but also avoid quick intersection with neighboring spots. Measuring the overall intensity of the region outside the spots involved creating a simple attribute subset that consisted of 9

ACS Paragon Plus Environment

Page 11 of 29

Journal of Proteome Research

Automatic identification and quantification of extra-well fluorescence in microarray images

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

only two types of attributes: those that measure the spread of the effect, and those that measure the strength. These measurements consisted of either the mean pixel intensity of the background generated from Array-Pro Analyzer (Media Cybernetics, Inc.) or a count of pixels in the area within and near the grid circle that had intensity values above an image dependent threshold (a description of how this threshold is determined is provided later). These measurements were chosen as they would not only evaluate the brightness of the area outside the spot, but the span of the glow as well (Supplementary Figure S6). Supplementary Table S7 lists all the attributes that were included in the final model, as well as their information gain value. Each attribute utilized in the final model had an information gain value greater than zero, which is an acceptable threshold when selecting features with this method. Supplementary Figure S5 shows the distribution of the attribute values by ring type for some of the attributes used in our data set. Below, we provide descriptions of each of the attributes used in our experiment. Correlation- The value of the correlation attribute was obtained by computing Pearson's correlation of the r-distance of a pixel against its intensity. Beginning at an r-distance of 21, the pixel intensity was acquired for each pixel with r-distance up to 40 creating 20 ordered pairs: (r-distance r, pixel intensity of pixel at r). An r-distance of 21 was chosen as the minimum value since the edge of most of the spots was located at an r-distance of approximately 20, and we wanted to measure the pixel intensity descent in the area outside the spots. 40 was chosen as the maximum r-distance in order to prevent the search from traveling into another spot. Using the ordered pairs acquired, our software obtained a correlation value for the relation between r-distance and pixel intensity. Since there are four different radial directions, four sets of 20 ordered pairs were generated, resulting in 4 correlation attributes: upper right (URCorr), upper left (ULCorr), bottom right (BRCorr), and bottom left (BLCorr) correlations. In addition, the mean of the upper correlations (UCorrMean), bottom 10

ACS Paragon Plus Environment

Journal of Proteome Research

Page 12 of 29

Automatic identification and quantification of extra-well fluorescence in microarray images

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

correlations (BCorrMean), and all the correlations (All-CorrMean) were computed and served as three additional attributes, resulting in a final total of seven correlation related attributes. These attributes were chosen because it was believed that ring spots would have linearly decreasing pixel intensities as the pixels moved further away from the center of the spot while for non-rings this relation would have little correlation. Cutoff R-distance- Beginning at an r-distance of 0 (the center pixel), the pixel intensities of pixels were acquired until either the pixel intensity dropped below the threshold value, or the r-distance reached 40. If the pixel intensity dropped below the threshold, the r-distance at which this occurred was stored. If the r-distance reached 40 before the intensity dropped below the threshold, then 40 was recorded. The cutoff r-distance was recorded for each of the four considered directions of pixel shifts resulting in four attributes (URRadius, ULRadius, BRRadius, BLRadius). In addition, the means and medians of the four directions’ cutoff r-distances were computed creating two additional attributes (MeanRadius, MedianRadius) for a final total of six cutoff r-distance attributes. These attributes were chosen because it was believed that rings would have higher intensities outside the grid circle and thus would result in larger cutoff r-distances. Pixel Intensity- For each of the four considered directions, the pixel intensities of the pixels at rdistances of 25, 30, 35, and 40 were obtained and served as attributes. Each of these values was multiplied by a scaling factor described below. The scaled pixel intensities account for 16 total attributes (UR25, UL25, BR25, BL25,…, UR40, UL40, BR40, BL40). These attributes were chosen for the same reason as the cutoff r-distance; rings should have higher pixel intensities outside the grid circle. Background Intensity- This attribute is the only value generated in Array-Pro Analyzer (Media Cybernetics, Inc.) that is used for the purpose of classification. It is computed by taking the mean 11

ACS Paragon Plus Environment

Page 13 of 29

Journal of Proteome Research

Automatic identification and quantification of extra-well fluorescence in microarray images

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

pixel intensity of a two-pixel-thick ring around the grid circle of each spot (Figure 4). This value was multiplied by a scaling factor as well. The background intensity accounts for two attributes of the attribute set (Background, NormBackground). Again, this attribute was chosen because rings should have higher pixel intensities outside the grid circle. Ring Area-The pixel intensity of each pixel in a 69x69 pixel box centered at the center pixel of each grid circle was determined (Figure 4). For each quadrant of the box, the number of pixels that had an intensity value greater than or equal to the threshold value was counted. These counts were used to calculate the number of pixels above threshold for each half of the box resulting in 4 attributes (TopArea, BottomArea, LeftArea, RightArea). One additional attribute was created by summing all the quadrant values together (TotalArea), resulting in a final total of 5 ring area attributes. These attributes were chosen because it was believed that rings would have a greater count of pixels above the threshold value than a non-ring spot would. Threshold Determination To account for overall differences in intensity from image to image, we scaled each pixel intensity used for the computation of an attribute by a normalizing threshold value (except for Cutoff rdistance attributes). In order to determine the threshold value we first found the median value of the pixel intensities of each pixel with r-distances of 18-22 for each of the four possible directions of each spot in an image. This resulted in four median values per spot, one value for each of the directions. Each of the resulting 4 medians from each spot were aggregated into one set for the entire image. The median value of the resulting set was determined and served as the threshold value for that image. This process was done for each image in our data sets so that each image would have its own threshold value. This was done rather than taking the overall median intensity of the image 12

ACS Paragon Plus Environment

Journal of Proteome Research

Page 14 of 29

Automatic identification and quantification of extra-well fluorescence in microarray images

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

because our program focuses on identifying the difference between rings and non-rings in the region just outside the spot. Taking the median intensity of the whole image can factor in pixel intensities from large, but irrelevant portions of the image that may be subject to heavy amounts of noise. See Supplementary Figure S3.

Figure 5. Flowchart of system training and testing Building the Training Set and Details of Developing the Model In order to build a model to classify the spots on the microarray images, we created a training set using 17 randomly selected images from the CC image set. We chose to use the CC image set for training and preliminary testing and reserved the TB image set for later rounds of testing (Figure 5) for two reasons: 1) the CC image set did not contain ratings for every spot on the images and therefore precision statistics would likely be overestimated due to a class balance that was not reflective of the actual ratio of rings:non-rings on our microarrays and 2) if our system performed well on an image set produced under different circumstances than the image set it was trained on, then there would be a greater likelihood of external applicability of our methods. The annotated data set for the 17 images was comprised of 6589 non-rings and 63, 68, 50, 40, and 58 rings rated as one, two, three, four, and five, respectively. Supplementary Table S1 shows the counts of the images and spots for each of the sets used in training and testing. 13

ACS Paragon Plus Environment

Page 15 of 29

Journal of Proteome Research

Automatic identification and quantification of extra-well fluorescence in microarray images

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

After the labels for each spot in the training set were joined with the attributes generated by our software, the label-attribute data were aggregated into a single file. Since non-rings represented over 90% of the spots in this dataset, we were concerned that the classifiers would build models with high overall accuracy, but low accuracy in the ring classes by classifying all spots as zeros. To account for this, we down-sampled the non-rings from 6589 to 280, so that approximately half the training set contained non-rings and the other half contained ring instances To build the model for classification, we ran our training data set through several iterations of model building with the data mining software program, Weka14. We used classifiers including SMO15, Naïve Bayes16, and Random Forest17 incorporating several different parameters (when applicable) to develop models, each time evaluating the model using 5-fold cross-validation. Since the primary goal of our experiment was the identification of rings, we selected models based on their ability to classify non-rings as non-rings and rings as rings, regardless of ring grade. The models that were selected to undergo further testing included a Random Forest model incorporating 600 trees and 6 random features at each node, and an SMO model with a cost parameter of 8 and an RBF kernel with a gamma value of 0.125. Building the Test Set and Details for Testing Images not used in the training set were used for one of two types of testing: pre-testing and final testing. The images used in the pre-test sets were used as a second round of model performance evaluation after 5-fold cross-validation. If the performance was unsatisfactory, the specific spots which the model misclassified were observed and analyzed to learn possible reasons for misclassification and to improve the model in future rounds. This was also done to determine rules that may help increase the model’s balance between positive predictive value and sensitivity. The 14

ACS Paragon Plus Environment

Journal of Proteome Research

Page 16 of 29

Automatic identification and quantification of extra-well fluorescence in microarray images

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

images used in the final test set were used as a final means of evaluating the model. No further attempts to improve the model were made after testing on the final test set was completed. Preliminary Testing Two rounds of preliminary testing were conducted to further evaluate and strengthen our system’s performance. The goal of the first round of pre-testing was to narrow our selection of two models from cross-validation, to a single model to be used for the remainder of the experiment. To do this, we tested our models on the data from the remaining 9 CC images. Again, both models were evaluated on the basis of their ability to distinguish between rings and non-rings. Our performance statistics included the following: overall recall (a.k.a. sensitivity), overall specificity, overall precision (a.k.a. positive predictive value), f-score, and recall and precision by class (indicated as R# and P# in Table 4). The first run for both models demonstrated low positive predictive value for spots classified as one and two – i.e., the fraction of spots that actually exhibited rings among those that were classified as one and two was low. Based on this observation, we performed a correction (denoted as CCorr henceforth) on the ones and twos using the value of their AllCorrMean attribute. If a spot classified as a one or a two had an AllCorrMean value greater than the cutoff, it would be reclassified as a zero. For example, if the AllCorrMean correction threshold is set to be -0.4 and a spot classified as a two before the correction has an AllCorrMean value of -0.25, then the system would relabel it as a nonring spot (class zero). This procedure was tested using a variety of cutoff values. After collecting a range of post-correction performance results, it was decided that the Random Forest model would be sufficient for future testing and the SMO model would no longer be utilized.

15

ACS Paragon Plus Environment

Page 17 of 29

Journal of Proteome Research

Automatic identification and quantification of extra-well fluorescence in microarray images

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The goal of the second round of preliminary testing was to test the performance of the remaining model on a full data set and calibrate the corrections performed post-classification. This data set consisted of the spot data from 7 TB images consisting of 16,464 spots, 97.9% of which were rated as non-rings. As with the first round of preliminary testing, this round of testing began with evaluating the model’s performance without any post-classification corrections. As expected, the precision was again an issue; however, this time the problem extended to the three class as well. Post-classification corrections were performed again, this time utilizing information regarding the spatial arrangement of rings and knowledge of other microarray anomalies to make two additional corrections. The first additional correction was made based on a streaking artifact that is sometimes seen in our microarray slides which we referred to as comets (Supplementary Figure S4)

11

. Spots

contained within a comet streak commonly contained similar attribute values to ring spots and, thus, were frequently classified as rings despite not showing the ring effect. To alleviate this problem, any spot classified as a ring with grades one, two, or three that was located in a consecutive vertical stretch of spots classified as a ring was changed to a zero. We later denote this correction as CComet. The second correction was made based on the explosion of background intensity that often emerged from rings of grade size five. The ring size of fives is so large that it extends into, and even past neighboring spots, frequently resulting in those spots being misclassified as rings as well. To account for this, any spot surrounding a five was reclassified as a zero if its background intensity was not greater than a defined percentage of the background intensity of the five. This correction is denoted as CNeighbor. We tested multiple correction-parameter combinations before reaching the final parameters of four or more consecutive ring spots for CComet, 60% of the background intensity for CNeighbor, and a cutoff value of -0.32 for CCorr, which was expanded to include ring grades of three and four in addition to one and two. Note that the corrections were performed in the order CComet, 16

ACS Paragon Plus Environment

Journal of Proteome Research

Page 18 of 29

Automatic identification and quantification of extra-well fluorescence in microarray images

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

CNeighbor, CCorr. After determining the final parameter combination, our system was tested on the final test set. Final Testing In order to minimize performance bias in our experiment, we set aside 30% of the TB image set to perform a final evaluation of the model. No further modifications were made to the model or correction parameters after this evaluation. This final testing set contained 3 images consisting of 7056 spots, 98.3% of which were non-rings. We determined the performance of our model with and without corrections on this set. See Figure 5 for a flow chart illustrating each phase of training and testing in our experiment. RESULTS Annotator Agreement Cohen’s kappa coefficient for the agreement between our annotations and the annotations conducted prior to the beginning of this experiment are .81 and .67 for the CC annotations and TB annotations, respectively (See Supplementary Table S2). Evaluation of Model Performance Training Model Performance Table 2 shows the results from the 5-fold cross-validation for the ‘best’ performing models produced using Random Forest, SMO, and Naïve Bayes classifiers in Weka. The Naive Bayes model performed the worst in terms of misclassifying non-rings as rings and as a result was not selected for

17

ACS Paragon Plus Environment

Page 19 of 29

Journal of Proteome Research

Automatic identification and quantification of extra-well fluorescence in microarray images

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

further testing. Supplementary Tables S3-S5 show the cross-validation results broken down by ring grade. Table 2. 5-fold Cross-Validation Results on CCTrainingSet Classifier

False Positives*

False Negatives*

Random Forest

11

10

SMO

10

10

Naïve Bayes

30

8

*False positives are non-rings that were classified as rings. False negatives are rings classified as non-rings. Table 3. Performance statistics of models for Colon Cancer Test Set Model Correction

Recall

Specificity

Precision

F-score

R.F.

None

0.94

0.96

0.48

0.64

R.F.

CCorr Cut-off = -.35

0.72

0.99

0.71

0.71

R.F.

CCorr Cut-off = -.30

0.75

0.98

0.65

0.70

R.F.

CCorr Cut-off = -.25

0.81

0.98

0.63

0.71

SMO

none

0.94

0.96

0.47

0.63

SMO

CCorr Cut-off = -.35

0.72

0.99

0.68

0.70

SMO

CCorr Cut-off = -.30

0.76

0.98

0.62

0.68

SMO

CCorr Cut-off = -.25

0.82

0.98

0.58

0.68

Model Performance on CC Pre-Test Set Table 3 shows the results of each model on the CC test set. Each model was evaluated based on its ability to correctly classify a ring as a ring (true positive) rather than obtaining a precise match in

18

ACS Paragon Plus Environment

Journal of Proteome Research

Page 20 of 29

Automatic identification and quantification of extra-well fluorescence in microarray images

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ring rating. Since the Random Forest model achieved slightly better f-score in each correction category, the SMO model was not selected for further testing. Model Performance on TB Pre-Test Set Table 4 shows the precision (a.k.a. positive predictive value) and recall (a.k.a. sensitivity) of each ring grade, based on the performance of the Random Forest model on the TB pre-test set, with and without corrections. Although there were several combinations of corrections tested (Supplementary Table S6), only the final selected combination of corrections was included in the table. Since the positive predictive value of rings classified as ones was unreliably low in each case, we decided to add one additional correction in which all classifications of ones would be changed to zeros- i.e. spots classified as one would be considered non-rings in addition to those classified as zero (this correction is denoted as COne). The combination of all corrections together provided the best balance between recall and precision. Table 4. Model performance on TB Pre-Test Set and TB Final Testing Set incorporating various correction combinations. Corrections

Recall

Specificity

Precision

fscore

P5

P4

P3

P2

P1

R5

R4

R3

R2

R1

Initial Test Set Performance None

0.95

0.88

0.15

0.26

0.91

0.62

0.38

0.15

0.07

1

1

1

0.99

0.88

{CComet, CNeighbor , CCorr, COne}

0.56

0.99

0.63

0.59

0.91

0.74

0.71

0.46

N/A

0.96

0.95

0.91

0.59

0.17

Performance on Set Aside Test Set None

0.98

0.92

0.17

0.29

0.65

0.53

0.47

0.19

0.04

1

1

0.95

1

0.97

{CComet, CNeighbor, CCorr, COne}

0.72

0.99

0.64

0.68

0.81

0.74

0.7

0.44

N/A

1

0.91

0.86

0.76

0.44

19

ACS Paragon Plus Environment

Page 21 of 29

Journal of Proteome Research

Automatic identification and quantification of extra-well fluorescence in microarray images

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Model Performance on TB Final-Testing Set Table 4 shows the performance of our model with and without the final selected corrections on the TB final testing set. This data set was not used for any kind of training or calibration. System Performance Time Our system generated the attributes and performed the classifications involved in this experiment at a rate of approximately 5 seconds per image. This is approximately 180 times faster than the 15 minute average our annotators reported. DISCUSSION The task of identifying rings within microarray images is a difficult one, whether it be manual or automatic. However, using our developed system, we were able to classify rings of grade sizes two, three, four, and five with high recall and precision in about .55% the time that a human can. Though our system was able to perform well in these categories, it was not able to achieve an acceptable balance in recall and precision when classifying rings of grade one. We believe that this is likely due to the faint nature of the ones and the marginal difference between them and non-ring spots. Since classifications of one were more commonly non-rings than rings, we decided to consider the one classification equivalent to a zero classification – that is, spots classified as ones would be considered non-rings as well. As expected, this alteration did cause a significant drop in the recall of actual ones, but removed the classification type that led to the largest source of false positives (non-rings classified as rings). This allowed our system to obtain an overall precision of .64 on a data set in which the positive class (rings) were outnumbered approximately 58:1. While the removal of these classifications led to slightly lower recall, overall it still maintained a value of .72 on the final testing 20

ACS Paragon Plus Environment

Journal of Proteome Research

Page 22 of 29

Automatic identification and quantification of extra-well fluorescence in microarray images

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

set, with individual recalls of 1, 1, .86, .76, and .44 for ring grades five, four, three, two, one respectively. Although we were only able to obtain a recall value of .44 for the ones, we were willing to accept this sacrifice since most ones were so faint that they only resemble the potential of being a ring; many of the annotators of the original data disagreed on whether or not spots exhibiting effects of this nature should even be classified as rings. We decided to consider these signals to be ones in the data set annotated by our expert annotator, leaving it up to the user to decide whether or not they would like to consider this grade in their future analysis. Overall, we are confident in our system’s ability to perform well, regardless of various user needs. Although the core model cannot change without retraining, the adaptable correction features integrated into our system provide a great amount of flexibility. Our evaluation was aimed at finding a combination of settings that would provide a balance between precision and recall; however, other users more concerned about maintaining high recall (and hence low false negatives), including ones, could turn off the correction, or adjust the correction settings so that the system would be less prone to making these changes. Although this may result in a greater number of false positives, the likely increase in true positives may be worth the risk to some users. Even if the experimentalist has to look at many “questionable” spots manually, if the system greatly reduces the number of spots they need to examine, they can still benefit from significant time savings. Considering the case of our final test set, if the user wanted to use our system without any corrections, the system would have classified 117 out of 119 rings as rings (true positives) and approximately 570 non-rings as rings (false positives). Despite the low fscore, over 6300 of the 6937 non-ring spots could now be ignored. On the other hand, with the use of the corrections, the system would classify 86 out of 119 rings as rings and 48 non-rings as rings. While there is significant improvement in precision and f-score, the loss of 31 true positives may not

21

ACS Paragon Plus Environment

Page 23 of 29

Journal of Proteome Research

Automatic identification and quantification of extra-well fluorescence in microarray images

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

be considered worth the extra time saved given that these positives may be used in the discovery of a new biomarker. As for the secondary task of quantifying rings, we believe that our Total Area attribute along with the Norm Background attribute will sufficiently describe the size and strength of the rings. The Norm Background and Total Area distribution data from Supplementary Figure S5 demonstrate that the size of all ring classes is greatly distinguishable with the exception of fours and fives, and moderately to highly sized rings show differences in intensity. For future work, we plan on further testing the extendibility of our program by testing it on microarray images taken with wavelengths other than 635nm and images from noisy data sets. In addition, we will also attempt to include extra features to help increase the performance on images with drastic non-uniformity of intensity and enhance some aspects of its user-readiness. Our current system’s normalization method is capable of accounting for difference in overall intensity between images; however, it does not account for unequal distribution of intensity within a single image. We will incorporate a localized approach that finds different scaling factors for separate portions of the image. As for user-readiness, some of our attributes and the threshold determination are dependent on knowledge of spot size. The methods that we utilized for finding these values were dependent on the spot radius. Incorporating a means of automatically determining the spot radius into our system would make the utilization of our system slightly easier.

ASSOCIATED CONTENT Supporting Information. The supporting information is available free of charge and is included in the following document: 22

ACS Paragon Plus Environment

Journal of Proteome Research

Page 24 of 29

Automatic identification and quantification of extra-well fluorescence in microarray images

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Supplementary Material.pdf (PDF file). Figures and tables in the file include the following: Figure S1. Example of clean and dirty microarray images. Figure S2. Partial image of text file generated by Array-Pro Analyzer (Media Cybernetics, Inc.). Figure S3. Irrelevant portion of microarray image. Table S1. Image and Spot Counts per Training and Test Set. Figure S4. Example of “comet” streaking. Table S2. Agreement Statistics. Table S3.

5-fold Cross

Validation Results for Random Forest Model on CCTraining Set. Table S4.

5-fold Cross

Validation Results for SMO model on CC Training Set. Table S5. 5-fold Cross Validation Results for Naïve Bayes Model on CC Training Set. Table S6. Model performance on TB PreTest Set and TB Final Testing Set incorporating various correction combinations. Figure S5. Distribution of various attribute values by spot type. Figure S6. Example of high background intensity, small span spot and low back-ground intensity, large span spot. Table S7. Information gain values for all attributes used in model. Table S8. Performance metrics of a pixel Intensity model

and

statistical

model

based

on

preliminary

AUTHOR INFORMATION Corresponding Author *Robert Rivera. E-mail: [email protected]. Phone: 480-727-8322 Present Addresses †Medical College of Wisconsin, 8701 W Watertown Plank Rd, Milwaukee, WI 53226 Author Contributions

23

ACS Paragon Plus Environment

analysis.

Page 25 of 29

Journal of Proteome Research

Automatic identification and quantification of extra-well fluorescence in microarray images

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

RR developed and evaluated the software presented here, drafted the manuscript, and made significant contributions towards the design of the study, and the acquisition, analysis and interpretation of data. JW and XY annotated data for this study and made revisions to the drafted manuscript. GD, MH, XB, DMM and JQ annotated data for the study. TT annotated data for this study, contributed towards the analysis and interpretation of data, and made revisions to the drafted manuscript. JL and GW made significant contributions towards the conception and design of the study, the analysis and interpretation of presented data, and the revision of the drafted manuscript. All authors reviewed and approved the final manuscript. Notes The authors declare no competing financial interest. ACKNOWLEDGMENT This research was supported by Early Detection Research Network (NIH/NCI 7U01CA117374) and the Virginia G. Piper Foundation (AGR 12/19/07). ABBREVIATIONS NAPPA, Nucleic Acid Programmable Protein Arrays; CC, Colon Cancer; TB, Tuberculosis; UR, Upper Right; UL, Upper Left; BR, Bottom Right; BL, Bottom Left; URCorr, Correlation value for the relation between r-distance and pixel intensity in the upper right radial direction; ULCorr, Correlation value for the relation between r-distance and pixel intensity in the upper left radial direction; BRCorr, Correlation value for the relation between r-distance and pixel intensity in the bottom right radial direction; BLCorr, Correlation value for the relation between r-distance and pixel intensity in the bottom left radial direction; UCorrMean, Mean of the correlation value for the relation between r-distance and pixel intensity in the upper left and upper right radial directions; 24

ACS Paragon Plus Environment

Journal of Proteome Research

Page 26 of 29

Automatic identification and quantification of extra-well fluorescence in microarray images

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

BCorrMean, Mean of the correlation value for the relation between r-distance and pixel intensity in the bottom left and bottom right radial directions; All-CorrMean, Mean of the correlation value for the relation between r-distance and pixel intensity in all four radial directions; CCorr, Correction performed on rings graded ones and twos using the value of their AllCorrMean attribute; CComet, Correction performed to handle the comet-tail artifact; CNeighbor, Correction performed to handle the signal bleeding artifact; COne, Correction performed on rings graded as one. REFERENCES (1)

Anderson, K. S.; Sibani, S.; Wallstrom, G.; Qiu, J.; Mendoza, E. A.; Raphael, J.; Hainsworth, E.; Montor, W. R.; Wong, J.; Park, J. G.; et al. Protein Microarray Signature of Autoantibody Biomarkers for the Early Detection of Breast Cancer. J. Proteome Res. 2011, 10 (1), 85–96.

(2)

Ramachandran, N.; Raphael, J. V; Hainsworth, E.; Demirkan, G.; Fuentes, M. G.; Rolfs, A.; Hu, Y.; LaBaer, J. Next-generation high-density self-assembling functional protein arrays. Nat. Methods 2008, 5 (6), 535–538.

(3)

Sibani, S.; LaBaer, J. Immunoprofiling Using NAPPA Protein Microarrays; 2011; pp 149– 161.

(4)

Wang, J.; Figueroa, J. D.; Wallstrom, G.; Barker, K.; Park, J. G.; Demirkan, G.; Lissowska, J.; Anderson, K. S.; Qiu, J.; LaBaer, J. Plasma Autoantibodies Associated with Basal-like Breast Cancers. Cancer Epidemiol. Biomarkers Prev. 2015, 24 (9), 1332–1340.

(5)

Wang, J.; Shivakumar, S.; Barker, K.; Tang, Y.; Wallstrom, G.; Park, J. G.; Tsay, J.-C. J.; Pass, H. I.; Rom, W. N.; LaBaer, J.; et al. Comparative Study of Autoantibody Responses between Lung Adenocarcinoma and Benign Pulmonary Nodules. J. Thorac. Oncol. 2016, 11 25

ACS Paragon Plus Environment

Page 27 of 29

Journal of Proteome Research

Automatic identification and quantification of extra-well fluorescence in microarray images

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(3), 334–345. (6)

Díez, P.; González-González, M.; Lourido, L.; Dégano, R. M.; Ibarrola, N.; Casado-Vela, J.; LaBaer, J.; Fuentes, M. NAPPA as a Real New Method for Protein Microarray Generation. Microarrays (Basel, Switzerland) 2015, 4 (2), 214–227.

(7)

Bian, X.; Wasserfall, C.; Wallstrom, G.; Wang, J.; Wang, H.; Barker, K.; Schatz, D.; Atkinson, M.; Qiu, J.; LaBaer, J. Tracking the Antibody Immunome in Type 1 Diabetes Using Protein Arrays. J. Proteome Res. 2017, 16 (1), 195–203.

(8)

Montor, W. R.; Huang, J.; Hu, Y.; Hainsworth, E.; Lynch, S.; Kronish, J.-W.; Ordonez, C. L.; Logvinenko, T.; Lory, S.; LaBaer, J. Genome-wide study of Pseudomonas aeruginosa outer membrane protein immunogenicity using self-assembling protein microarrays. Infect. Immun. 2009, 77 (11), 4877–4886.

(9)

Anderson, K. S.; Cramer, D. W.; Sibani, S.; Wallstrom, G.; Wong, J.; Park, J.; Qiu, J.; Vitonis, A.; LaBaer, J. Autoantibody Signature for the Serologic Detection of Ovarian Cancer. J. Proteome Res. 2015, 14 (1), 578–586.

(10)

Song, L.; Wallstrom, G.; Yu, X.; Hopper, M.; Van Duine, J.; Steel, J.; Park, J.; Wiktor, P.; Kahn, P.; Brunner, A.; et al. Identification of Antibody Targets for Tuberculosis Serology using High-Density Nucleic Acid Programmable Protein Arrays. Mol. Cell. Proteomics 2017, 16 (4 suppl 1), S277–S289.

(11)

Gierahn, T. M.; Loginov, D.; Love, J. C. Crossword: A Fully Automated Algorithm for the Segmentation and Quality Control of Protein Microarray Images. J. Proteome Res. 2014, 13 (2), 362–371. 26

ACS Paragon Plus Environment

Journal of Proteome Research

Page 28 of 29

Automatic identification and quantification of extra-well fluorescence in microarray images

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(12)

Moffitt, R. A.; Yin-Goen, Q.; Stokes, T. H.; Parry, R.; Torrance, J. H.; Phan, J. H.; Young, A. N.; Wang, M. D. caCORRECT2: Improving the accuracy and reliability of microarray data in the presence of artifacts. BMC Bioinformatics 2011, 12 (1), 383.

(13)

Zhu, X.; Gerstein, M.; Snyder, M. ProCAT: a data analysis approach for protein microarrays. Genome Biol. 2006, 7 (11), R110.

(14)

Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I. H. The WEKA data mining software. ACM SIGKDD Explor. Newsl. 2009, 11 (1), 10.

(15)

Platt, J. C. Fast training of support vector machines using sequential minimal optimization. In Advances in kernel methods; MIT Press: Cambride, MA, USA, 1999; pp 185–208.

(16)

Lewis, D. D. Naive (Bayes) at forty: The independence assumption in information retrieval; Springer Berlin Heidelberg, 1998; pp 4–15.

(17)

Breiman, L. Random Forests. Mach. Learn. 2001, 45 (1), 5–32.

27

ACS Paragon Plus Environment

Page 29 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Table of Contents (TOC)/Abstract (ABS) Graphic 115x65mm (300 x 300 DPI)

ACS Paragon Plus Environment