Subscriber access provided by READING UNIV
Article
Deep learning accurately predicts estrogen receptor status in breast cancer metabolomics data Fadhl M. Alkawaa, Kumardeep Chaudhary, and Lana X. Garmire J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.7b00595 • Publication Date (Web): 07 Nov 2017 Downloaded from http://pubs.acs.org on November 8, 2017
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 29
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
1
Deep learning accurately predicts estrogen receptor status in breast
2
cancer metabolomics data
3
Fadhl M Alakwaa1, Kumardeep Chaudhary1, Lana X Garmire1,2*
4
1
Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI 96813, USA.
5
2
Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at Manoa,
6
Honolulu, HI 96822, USA.
7
*To whom the correspondence should be addressed:
8
Lana X Garmire, PhD
9
Associate Professor
10
Email address:
[email protected] 11
Phone: +1 (808) 441-8193
12
13
14
15
ACS Paragon Plus Environment
Journal of Proteome Research
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 2 of 29
16
ABSTRACT
17
Metabolomics holds the promise as a new technology to diagnose highly heterogeneous diseases.
18
Conventionally, metabolomics data analysis for diagnosis is done using various statistical and machine
19
learning based classification methods. However, it remains unknown if deep neural network, a class of
20
increasingly popular machine learning methods, is suitable to classify metabolomics data. Here we use a
21
cohort of 271 breast cancer tissues, 204 positive estrogen receptor (ER+) and 67 negative estrogen
22
receptor (ER-), to test the accuracies of autoencoder, a deep learning (DL) framework, as well as six
23
widely used machine learning models, namely Random Forest (RF), Support Vector Machines (SVM),
24
Recursive Partitioning and Regression Trees (RPART), Linear Discriminant Analysis (LDA), Prediction
25
Analysis for Microarrays (PAM), and Generalized Boosted Models (GBM). DL framework has the
26
highest area under the curve (AUC) of 0.93 in classifying ER+/ER- patients, compared to the other six
27
machine learning algorithms. Furthermore, the biological interpretation of the first hidden layer reveals
28
eight commonly enriched significant metabolomics pathways (adjusted P-value