Deep Learning Accurately Predicts Estrogen ... - ACS Publications

Nov 7, 2017 - pathways are also confirmed in integrated analysis between metabolomics and gene expression data in these samples. In summary, deep ... ...
3 downloads 13 Views 27MB Size
Subscriber access provided by READING UNIV

Article

Deep learning accurately predicts estrogen receptor status in breast cancer metabolomics data Fadhl M. Alkawaa, Kumardeep Chaudhary, and Lana X. Garmire J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.7b00595 • Publication Date (Web): 07 Nov 2017 Downloaded from http://pubs.acs.org on November 8, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

1

Deep learning accurately predicts estrogen receptor status in breast

2

cancer metabolomics data

3

Fadhl M Alakwaa1, Kumardeep Chaudhary1, Lana X Garmire1,2*

4

1

Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI 96813, USA.

5

2

Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at Manoa,

6

Honolulu, HI 96822, USA.

7

*To whom the correspondence should be addressed:

8

Lana X Garmire, PhD

9

Associate Professor

10

Email address: [email protected]

11

Phone: +1 (808) 441-8193

12

13

14

15

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 29

16

ABSTRACT

17

Metabolomics holds the promise as a new technology to diagnose highly heterogeneous diseases.

18

Conventionally, metabolomics data analysis for diagnosis is done using various statistical and machine

19

learning based classification methods. However, it remains unknown if deep neural network, a class of

20

increasingly popular machine learning methods, is suitable to classify metabolomics data. Here we use a

21

cohort of 271 breast cancer tissues, 204 positive estrogen receptor (ER+) and 67 negative estrogen

22

receptor (ER-), to test the accuracies of autoencoder, a deep learning (DL) framework, as well as six

23

widely used machine learning models, namely Random Forest (RF), Support Vector Machines (SVM),

24

Recursive Partitioning and Regression Trees (RPART), Linear Discriminant Analysis (LDA), Prediction

25

Analysis for Microarrays (PAM), and Generalized Boosted Models (GBM). DL framework has the

26

highest area under the curve (AUC) of 0.93 in classifying ER+/ER- patients, compared to the other six

27

machine learning algorithms. Furthermore, the biological interpretation of the first hidden layer reveals

28

eight commonly enriched significant metabolomics pathways (adjusted P-value