...
首页> 外文期刊>Genome research >Prediction and characterization of noncoding RNAs in C. elegans by integrating conservation, secondary structure, and high-throughput sequencing and array data.
【24h】

Prediction and characterization of noncoding RNAs in C. elegans by integrating conservation, secondary structure, and high-throughput sequencing and array data.

机译:通过整合保守性,二级结构,高通量测序和阵列数据,对秀丽隐杆线虫中非编码RNA进行预测和表征。

获取原文
获取原文并翻译 | 示例

摘要

We present an integrative machine learning method, incRNA, for whole-genome identification of noncoding RNAs (ncRNAs). It combines a large amount of expression data, RNA secondary-structure stability, and evolutionary conservation at the protein and nucleic-acid level. Using the incRNA model and data from the modENCODE consortium, we are able to separate known C. elegans ncRNAs from coding sequences and other genomic elements with a high level of accuracy (97% AUC on an independent validation set), and find more than 7000 novel ncRNA candidates, among which more than 1000 are located in the intergenic regions of C. elegans genome. Based on the validation set, we estimate that 91% of the approximately 7000 novel ncRNA candidates are true positives. We then analyze 15 novel ncRNA candidates by RT-PCR, detecting the expression for 14. In addition, we characterize the properties of all the novel ncRNA candidates and find that they have distinct expression patterns across developmental stages and tend to use novel RNA structural families. We also find that they are often targeted by specific transcription factors ( approximately 59% of intergenic novel ncRNA candidates). Overall, our study identifies many new potential ncRNAs in C. elegans and provides a method that can be adapted to other organisms.
机译:我们提出了一种集成的机器学习方法,即incRNA,用于非编码RNA(ncRNA)的全基因组鉴定。它结合了大量的表达数据,RNA二级结构稳定性以及蛋白质和核酸水平的进化保守性。使用incRNA模型和来自modENCODE联盟的数据,我们能够以较高的准确度(独立验证集上的97%AUC)从编码序列和其他基因组元件中分离出已知的秀丽隐杆线虫ncRNA。新型ncRNA候选基因,其中有超过1000个位于秀丽隐杆线虫基因组的基因间区域。基于验证集,我们估计大约7000种新的ncRNA候选物中有91%是真实阳性。然后,我们通过RT-PCR分析了15种新的ncRNA候选物,检测了14种的表达。此外,我们表征了所有新的ncRNA候选物的特性,发现它们在整个发育阶段均具有独特的表达模式,并且倾向于使用新的RNA结构家族。我们还发现,它们通常是特定转录因子(约59%的基因间新型ncRNA候选基因)靶向的。总体而言,我们的研究鉴定了秀丽隐杆线虫中许多新的潜在ncRNA,并提供了一种适用于其他生物的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号