Subscriber access provided by Kent State University Libraries
Article
Discovering putative peptides encoded from non-coding RNAs in ribosome profiling data of Arabidopsis thaliana Qilin Li, Md. Asif Ahsan, Hongjun Chen, Jitong Xue, and Ming Chen ACS Synth. Biol., Just Accepted Manuscript • DOI: 10.1021/acssynbio.7b00386 • Publication Date (Web): 27 Jan 2018 Downloaded from http://pubs.acs.org on January 28, 2018
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
ACS Synthetic Biology is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Synthetic Biology
2
Discovering putative peptides encoded from non-coding RNAs in ribosome profiling data of Arabidopsis thaliana
3 4 5 6
Qilin Li , Md. Asif Ahsan , Hongjun Chen , Jitong Xue and Ming Chen 1 Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou 310058, China 2 James D. Watson Institute of Genome Sciences, Zhejiang University, Hangzhou 310058, China
7 8
* To whom correspondence should be addressed. Tel: +86-571-88206612; Fax: +86-571-88206612; Email:
[email protected] 9
Abstract
1
1
1
1
1,2
1,2,*
10
Most of non-coding RNAs are considered as their expression at low levels and having a limited
11
phylogenetic distribution in the cytoplasm, indicating that they may be only involved in specific
12
biological processes. However, recent studies showed the protein-coding potential of ncRNAs,
13
indicating that they might be source of some special proteins. Although there are increasing non-
14
coding RNAs identified to be able to code proteins, it is challenging to distinguish coding RNAs from
15
previously annotated ncRNAs, and to detect the proteins from their translation. In this article, we
16
designed a pipeline to identify these non-coding RNAs in Arabidopsis thaliana from three NCBI GEO
17
datasets with coding potential and predict their translation products. 31,311 non-coding RNAs were
18
predicted to be translated into peptides, and they showed lower conservation rate than common
19
proteins. In addition, we built an interaction network between these peptides and annotated
20
Arabidopsis proteins using BIPS, which included 69 peptides from non-coding RNAs. Peptides in the
21
interaction network showed different characteristics from other non-coding RNA-derived peptides, and
22
they participated in several crucial biological processes, such as photorespiration and stress-
23
responses. All the Information of putative ncPEPs and their interaction with proteins predicted above
24
are finally integrated in a database, PncPEPDB (http://bis.zju.edu.cn/PncPEPDB). These results
25
showed that peptides derived from non-coding RNAs may play important roles in non-coding RNA
26
regulation, which provided another hypothesis that non-coding RNA may regulate the metabolism via
27
their translation products.
28
Keywords: ribosome profiling, ncRNA-encoded peptides, peptide-protein interaction network,
29
database and visualization
30
Introduction
31
As transcriptomic research develops rapidly, nucleic acid sequences which were considered weakly
32
expressed into proteins have become a hotspot
33
RNAs (ncRNAs) such as micro RNA (miRNA), long non-coding RNA (lncRNA), circular RNA
34
(circRNA), and competing endogenous RNA (ceRNA), were discovered to play crucial roles in gene
35
regulatory networks, leading to a trend of studying them and their interactions
36
expressed at low levels and have a limited phylogenetic distribution in the cytoplasm12, meaning that
37
they may be only involved in specific biological processes.
1-4
. During the decades of research, non-coding
1
ACS Paragon Plus Environment
5-11
. Most of them are
ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 2 of 20
1
Whether translation exists in these weakly expressed transcripts remains controversial. However
2
recent studies showed the protein-coding potential of ncRNAs. lncRNAs were found to lodged into
3
ribosome, indicating that they might be source of some special proteins
4
open reading frames that encode small peptides. The regulatory roles of some miRNA-encoded
5
peptides (miPEPs) have been investigated
6
able to code proteins, it is challenging to distinguish coding RNAs from previously annotated ncRNAs,
7
and to detect the proteins from ncRNAs translation.
14
13
. microRNAs also contain
. Although there are increasing ncRNAs identified to be
8
In 2009, ribosome profiling sequencing (Ribo-Seq), a new technique developed by N. Ingolia et al.
9
made detection of small proteins with low abundance possible (Fig 1A) 15. Up to date, This technology
10
had been chosen as a tool in various investigations, for instance, in order to prolong heat stress
11
scientists globally profile the adaptive response of Arabidopsis thaliana by Ribo-Seq
12
the three-nucleotide periodicity of the reads, resulting from the movement of the ribosome along the
13
coding sequence, differentiates translated sequences from other possible RNA protein complexes. A
14
growing number of studies based on this technique have reported that a significant proportion of
15
ncRNAs are translated
16
small proteins or peptides are not yet clear. Some of them may either be involved in their
17
corresponding ncRNA expression events, or form an interaction network with other proteins. On a
18
contrast, a substantial number of small proteins detected in Ribo-Seq may be encoded from
19
misannotated protein coding genes, which have not been correctly predicted by bioinformatics
20
algorithms because of their short size. This present study takes advantage of the existed Ribo-Seq
21
and RNA-Seq data for Arabidopsis thaliana to investigate the putative ncRNAs and their expression
22
products, providing evidence that ncRNAs may have more possible functions with the peptides.
23
Material and Methods
24
1 Detection of translated ORFs from Ribo-Seq data and related analysis
25
Ribo-Seq and RNA-Seq data of leaf, root, shoot and flower bud in Arabidopsis thaliana were obtained
26
from NCBI GEO Datasets (GSE40209
16
. In addition,
13, 14, 17-19
. However, the functions and regulation mechanisms of detected
21
20
, GSE69802 22
18
, GSE81332
16
). After removal of adapters and
27
processed with TopHat
28
differed into coding and putative non-coding RNAs using CuffCompare. Transcripts with Cuffcompare
29
class code of i (novel intronic), u (novel intergenic) and x (novel antisense) are recognized as putative
30
non-coding RNAs, and they were later aligned to TAIR10 in order to filter coding sequences (Fig 2).
31
and Cufflinks
with TAIR10 genome, all the assembled transcripts were
These transcripts were processed with TransDecoder
23
and CIPHER
13
to get their RNA
32
sequences, peptide sequences and coding scores. Sequences of non-coding RNAs were processed
33
by BLASTN with reference sequence of annotated ncRNAs in Ensembl24 (tRNA, rRNA, snRNA,
34
snoRNA and miRNA), annotated and predicted ncRNAs in GreeNC
35
(ceRNA). For peptide sequence conservation analysis, homologues of the Arabidopsis ORFs in
36
transcript assemblies of Oryza sativa (e-value