首页> 外文期刊>Journal of Bioinformatics and Computational Biology >A new algorithm for DNA motif discovery using multiple sample sequence sets
【24h】

A new algorithm for DNA motif discovery using multiple sample sequence sets

机译:一种使用多个样本序列集的DNA主题发现的新算法

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

DNA motif discovery plays an important role in understanding the mechanisms of gene regulation. Most existing motif discovery algorithms can identify motifs in an efficient and effective manner when dealing with small datasets. However, large datasets generated by high-throughput sequencing technologies pose a huge challenge: it is too time-consuming to process the entire dataset, but if only a small sample sequence set is processed, it is difficult to identify infrequent motifs. In this paper, we propose a new DNA motif discovery algorithm: first divide the input dataset into multiple sample sequence sets, then refine initial motifs of each sample sequence set with the expectation maximization method, and finally combine all the results from each sample sequence set. Besides, we design a new initial motif generation method with the utilization of the entire dataset, which helps to identify infrequent motifs. The experimental results on the simulated data show that the proposed algorithm has better time performance for large datasets and better accuracy of identifying infrequent motifs than the compared algorithms. Also, we have verified the validity of the proposed algorithm on the real data.
机译:DNA主题发现在理解基因调控机制方面发挥着重要作用。大多数现有的主题发现算法可以在处理小型数据集时以有效且有效的方式识别图案。但是,由高吞吐量排序技术生成的大型数据集造成巨大挑战:处理整个数据集是太耗时的,但如果仅处理小样本序列集,则难以识别不频繁的图案。在本文中,我们提出了一种新的DNA主题发现算法:首先将输入数据集分成多个样本序列集,然后用预期最大化方法将每个样本序列的初始图案缩小,最后将所有结果组合在每个样本序列集中。此外,我们设计了一种新的初始主题生成方法,利用整个数据集,有助于识别不频繁的主题。模拟数据上的实验结果表明,该算法具有更好的时间性能,对于大型数据集具有比比较算法更好的大型数据集的时间性能以及识别不频繁图案的更好准确性。此外,我们已经验证了在实际数据上提出了算法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号