首页> 外文会议>International Conference on Fuzzy Systems and Knowledge Discovery >Long intergenic non-coding RNA detection benefited from integrative modeling of (Epi)genomic data
【24h】

Long intergenic non-coding RNA detection benefited from integrative modeling of (Epi)genomic data

机译:长期的基因间非编码RNA检测受益于(Epi)基因组数据的整合建模

获取原文

摘要

Prediction of long intergenic non-coding RNAs (lincRNAs) is a prerequisite to analyze sequence features of non-coding RNAs and explore their regulatory function. Genomic sequence features provide fundamental backgrounds for lincRNA predictions, due to that sequence information at least partially aids such predictions. However, genomic sequence alone seems to reach an end involving sensitivity for lincRNA prediction in eukaryotes. Chromatin factors leave marks that can be captured by high-throughput approaches such as ChIP-seq are also important features, as revealed by previous studies. We demonstrate that the performance of lincRNA predictions can be improved when incorporating both high-throughput chromatin modification and genomic sequence features by logistic regression with LASSO regularization. The discriminating features include H3K4me1, H3K27ac, H3K9me3, Open Reading Frames and several repeat elements. Importantly, chromatin information is suggested to be complementary to genomic sequence information, highlighting the importance of an integrated model. We also show that the lincRNA expression specificity can be efficiently modeled by the chromatin data with same developmental stage. The study not only supports the biological hypothesis that chromatin factors can regulate developmental-stage-specific expression of lincRNAs, also reveals the discriminating features between lincRNA and coding genes.
机译:预测长基因间非编码RNA(lincRNA)是分析非编码RNA的序列特征并探索其调控功能的先决条件。基因组序列特征为lincRNA预测提供了基础背景,因为该序列信息至少部分有助于这种预测。然而,仅基因组序列似乎就达到了涉及真核生物中lincRNA预测敏感性的目的。染色质因子留下的痕迹可以被高通量方法(例如ChIP-seq)捕获,这也是重要的特征,正如先前的研究所揭示的那样。我们证明,当结合高通量染色质修饰和基因组序列特征通过LASSO正则化进行逻辑回归时,可以提高lincRNA预测的性能。区别特征包括H3K4me1,H3K27ac,H3K9me3,开放阅读框和几个重复元件。重要的是,染色质信息被认为是基因组序列信息的补充,突出了集成模型的重要性。我们还显示,可以通过具有相同发育阶段的染​​色质数据有效地模拟lincRNA表达特异性。这项研究不仅支持染色质因子可以调控lincRNA发育阶段特异性表达的生物学假设,还揭示了lincRNA与编码基因之间的区别特征。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号