首页> 外文会议>Research in computational molecular biology. >Simultaneously Learning DNA Motif along with Its Position and Sequence Rank Preferences through EM Algorithm
【24h】

Simultaneously Learning DNA Motif along with Its Position and Sequence Rank Preferences through EM Algorithm

机译:通过EM算法同时学习DNA图案及其位置和序列秩首选项

获取原文
获取原文并翻译 | 示例

摘要

Although de novo motifs can be discovered through mining over-represented sequence patterns, this approach misses some real motifs and generates many false positives. To improve accuracy, one solution is to consider some additional binding features (i.e. position preference and sequence rank preference). This information is usually required from the user. This paper presents a de novo motif discovery algorithm called SEME which uses pure probabilistic mixture model to model the motif's binding features and uses expectation maximization (EM) algorithms to simultaneously learn the sequence motif, position and sequence rank preferences without asking for any prior knowledge from the user. SEME is both efficient and accurate thanks to two important techniques: the variable motif length extension and importance sampling. Using 75 large scale synthetic datasets, 32 metazoan compendium benchmark datasets and 164 ChIP-Seq libraries, we demonstrated the superior performance of SEME over existing programs in finding transcription factor (TF) binding sites. SEME is further applied to a more difficult problem of finding the co-regulated TF (co-TF) motifs in 15 ChlP-Seq libraries. It identified significantly more correct co-TF motifs and, at the same time, predicted co-TF motifs with better matching to the known motifs. Finally, we show that the learned position and sequence rank preferences of each co-TF reveals potential interaction mechanisms between the primary TF and the co-TF within these sites. Some of these findings were further validated by the ChlP-Seq experiments of the co-TFs.
机译:尽管可以通过挖掘过度代表的序列模式来发现从头图案,但是这种方法会遗漏一些真实的图案并产生许多假阳性。为了提高准确性,一种解决方案是考虑一些附加的结合特征(即位置偏好和序列秩偏好)。通常,用户需要此信息。本文提出了一种称为SEME的从头基序发现算法,该算法使用纯概率混合模型对基序的结合特征进行建模,并使用期望最大化(EM)算法来同时学习序列基序,位置和序列等级偏好,而无需从中获取任何先验知识用户。由于两种重要技术,SEME既高效又准确:可变基序长度扩展和重要性采样。使用75个大型合成数据集,32个后生纲目基准数据集和164个ChIP-Seq库,我们证明了SEME在查找转录因子(TF)结合位点方面优于现有程序。 SEME还应用于在15个ChlP-Seq文库中发现共同调节的TF(co-TF)主题的更困难的问题。它鉴定出明显更正确的co-TF基序,同时预测了与已知基序更好匹配的co-TF基序。最后,我们表明,每个co-TF的学习位置和序列秩偏好揭示了这些站点内主要TF和co-TF之间的潜在相互作用机制。这些发现中的一些已通过co-TF的ChlP-Seq实验进一步证实。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号