首页> 外文期刊>Nucleic Acids Research >A common set of distinct features that characterize noncoding RNAs across multiple species
【24h】

A common set of distinct features that characterize noncoding RNAs across multiple species

机译:一组共同的独特特征,可表征多种物种的非编码RNA

获取原文
获取原文并翻译 | 示例
           

摘要

To find signature features shared by various ncRNA sub-types and characterize novel ncRNAs, we have developed a method, RNAfeature, to investigate > 600 sets of genomic and epigenomic data with various evolutionary and biophysical scores. RNAfeature utilizes a fine-tuned intra-species wrapper algorithm that is followed by a novel feature selection strategy across species. It considers long distance effect of certain features (e.g. histone modification at the promoter region). We finally narrow down on 10 informative features (including sequences, structures, expression profiles and epigenetic signals). These features are complementary to each other and as a whole can accurately distinguish canonical ncRNAs from CDSs and UTRs (accuracies: > 92% in human, mouse, worm and fly). Moreover, the feature pattern is conserved across multiple species. For instance, the supervised 10-feature model derived from animal species can predict ncRNAs in Arabidopsis (accuracy: 82%). Subsequently, we integrate the 10 features to define a set of noncoding potential scores, which can identify, evaluate and characterize novel noncoding RNAs. The score covers all transcribed regions (including unconserved ncRNAs), without requiring assembly of the full-length transcripts. Importantly, the noncoding potential allows us to identify and characterize potential functional domains with feature patterns similar to canonical ncRNAs (e.g. tRNA, snRNA, miRNA, etc) on similar to 70% of human long ncRNAs (lncRNAs).
机译:为了找到各种ncRNA亚型共有的签名特征并表征新颖的ncRNA,我们开发了一种RNAfeature方法,用于研究600多种具有各种进化和生物物理评分的基因组和表观基因组数据。 RNAfeature利用微调的物种内部包装算法,然后采用跨物种的新颖特征选择策略。它考虑了某些特征的长距离效应(例如启动子区域的组蛋白修饰)。我们最终缩小了10种信息特征(包括序列,结构,表达谱和表观遗传信号)。这些功能是相互补充的,并且总体上可以准确区分规范性ncRNA和CDS和UTR(准确性:在人类,小鼠,蠕虫和果蝇中> 92%)。此外,该特征模式在多个物种中都是保守的。例如,来自动物物种的有监督的10个特征的模型可以预测拟南芥中的ncRNA(准确性:82%)。随后,我们整合了10个功能以定义一组非编码潜在得分,这些得分可以识别,评估和表征新型非编码RNA。分数涵盖所有转录区域(包括不保守的ncRNA),而无需组装全长转录本。重要的是,非编码潜力使我们能够识别和表征潜在功能域,其特征模式类似于70%的人类长ncRNA(lncRNA)的规范ncRNA(例如tRNA,snRNA,miRNA等)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号