Discovering putative peptides encoded from non-coding RNAs in

ribosome profiling data of Arabidopsis thaliana. 2. Qilin Li1, Md. Asif Ahsan1, Hongjun Chen1, Jitong Xue1,2 and Ming Chen1,2,*. 3. 1 Department of Bi...
0 downloads 3 Views 2MB Size
Subscriber access provided by Kent State University Libraries

Article

Discovering putative peptides encoded from non-coding RNAs in ribosome profiling data of Arabidopsis thaliana Qilin Li, Md. Asif Ahsan, Hongjun Chen, Jitong Xue, and Ming Chen ACS Synth. Biol., Just Accepted Manuscript • DOI: 10.1021/acssynbio.7b00386 • Publication Date (Web): 27 Jan 2018 Downloaded from http://pubs.acs.org on January 28, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

ACS Synthetic Biology is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

2

Discovering putative peptides encoded from non-coding RNAs in ribosome profiling data of Arabidopsis thaliana

3 4 5 6

Qilin Li , Md. Asif Ahsan , Hongjun Chen , Jitong Xue and Ming Chen 1 Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou 310058, China 2 James D. Watson Institute of Genome Sciences, Zhejiang University, Hangzhou 310058, China

7 8

* To whom correspondence should be addressed. Tel: +86-571-88206612; Fax: +86-571-88206612; Email: [email protected]

9

Abstract

1

1

1

1

1,2

1,2,*

10

Most of non-coding RNAs are considered as their expression at low levels and having a limited

11

phylogenetic distribution in the cytoplasm, indicating that they may be only involved in specific

12

biological processes. However, recent studies showed the protein-coding potential of ncRNAs,

13

indicating that they might be source of some special proteins. Although there are increasing non-

14

coding RNAs identified to be able to code proteins, it is challenging to distinguish coding RNAs from

15

previously annotated ncRNAs, and to detect the proteins from their translation. In this article, we

16

designed a pipeline to identify these non-coding RNAs in Arabidopsis thaliana from three NCBI GEO

17

datasets with coding potential and predict their translation products. 31,311 non-coding RNAs were

18

predicted to be translated into peptides, and they showed lower conservation rate than common

19

proteins. In addition, we built an interaction network between these peptides and annotated

20

Arabidopsis proteins using BIPS, which included 69 peptides from non-coding RNAs. Peptides in the

21

interaction network showed different characteristics from other non-coding RNA-derived peptides, and

22

they participated in several crucial biological processes, such as photorespiration and stress-

23

responses. All the Information of putative ncPEPs and their interaction with proteins predicted above

24

are finally integrated in a database, PncPEPDB (http://bis.zju.edu.cn/PncPEPDB). These results

25

showed that peptides derived from non-coding RNAs may play important roles in non-coding RNA

26

regulation, which provided another hypothesis that non-coding RNA may regulate the metabolism via

27

their translation products.

28

Keywords: ribosome profiling, ncRNA-encoded peptides, peptide-protein interaction network,

29

database and visualization

30

Introduction

31

As transcriptomic research develops rapidly, nucleic acid sequences which were considered weakly

32

expressed into proteins have become a hotspot

33

RNAs (ncRNAs) such as micro RNA (miRNA), long non-coding RNA (lncRNA), circular RNA

34

(circRNA), and competing endogenous RNA (ceRNA), were discovered to play crucial roles in gene

35

regulatory networks, leading to a trend of studying them and their interactions

36

expressed at low levels and have a limited phylogenetic distribution in the cytoplasm12, meaning that

37

they may be only involved in specific biological processes.

1-4

. During the decades of research, non-coding

1

ACS Paragon Plus Environment

5-11

. Most of them are

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 20

1

Whether translation exists in these weakly expressed transcripts remains controversial. However

2

recent studies showed the protein-coding potential of ncRNAs. lncRNAs were found to lodged into

3

ribosome, indicating that they might be source of some special proteins

4

open reading frames that encode small peptides. The regulatory roles of some miRNA-encoded

5

peptides (miPEPs) have been investigated

6

able to code proteins, it is challenging to distinguish coding RNAs from previously annotated ncRNAs,

7

and to detect the proteins from ncRNAs translation.

14

13

. microRNAs also contain

. Although there are increasing ncRNAs identified to be

8

In 2009, ribosome profiling sequencing (Ribo-Seq), a new technique developed by N. Ingolia et al.

9

made detection of small proteins with low abundance possible (Fig 1A) 15. Up to date, This technology

10

had been chosen as a tool in various investigations, for instance, in order to prolong heat stress

11

scientists globally profile the adaptive response of Arabidopsis thaliana by Ribo-Seq

12

the three-nucleotide periodicity of the reads, resulting from the movement of the ribosome along the

13

coding sequence, differentiates translated sequences from other possible RNA protein complexes. A

14

growing number of studies based on this technique have reported that a significant proportion of

15

ncRNAs are translated

16

small proteins or peptides are not yet clear. Some of them may either be involved in their

17

corresponding ncRNA expression events, or form an interaction network with other proteins. On a

18

contrast, a substantial number of small proteins detected in Ribo-Seq may be encoded from

19

misannotated protein coding genes, which have not been correctly predicted by bioinformatics

20

algorithms because of their short size. This present study takes advantage of the existed Ribo-Seq

21

and RNA-Seq data for Arabidopsis thaliana to investigate the putative ncRNAs and their expression

22

products, providing evidence that ncRNAs may have more possible functions with the peptides.

23

Material and Methods

24

1 Detection of translated ORFs from Ribo-Seq data and related analysis

25

Ribo-Seq and RNA-Seq data of leaf, root, shoot and flower bud in Arabidopsis thaliana were obtained

26

from NCBI GEO Datasets (GSE40209

16

. In addition,

13, 14, 17-19

. However, the functions and regulation mechanisms of detected

21

20

, GSE69802 22

18

, GSE81332

16

). After removal of adapters and

27

processed with TopHat

28

differed into coding and putative non-coding RNAs using CuffCompare. Transcripts with Cuffcompare

29

class code of i (novel intronic), u (novel intergenic) and x (novel antisense) are recognized as putative

30

non-coding RNAs, and they were later aligned to TAIR10 in order to filter coding sequences (Fig 2).

31

and Cufflinks

with TAIR10 genome, all the assembled transcripts were

These transcripts were processed with TransDecoder

23

and CIPHER

13

to get their RNA

32

sequences, peptide sequences and coding scores. Sequences of non-coding RNAs were processed

33

by BLASTN with reference sequence of annotated ncRNAs in Ensembl24 (tRNA, rRNA, snRNA,

34

snoRNA and miRNA), annotated and predicted ncRNAs in GreeNC

35

(ceRNA). For peptide sequence conservation analysis, homologues of the Arabidopsis ORFs in

36

transcript assemblies of Oryza sativa (e-value