首页> 外文会议>IEEE International Conference on Bioinformatics and Biomedicine >Reference sequence selection for motif searches
【24h】

Reference sequence selection for motif searches

机译:主题搜索的参考序列选择

获取原文

摘要

The planted (l, d) motif search (PMS) is an important yet challenging problem in computational biology. Pattern-driven PMS algorithms usually use k out of t input sequences as reference sequences to generate candidate motifs, and they can find all the (l, d) motifs in the input sequences. However, most of them simply take the first k sequences in the input as reference sequences without elaborate selection processes, and thus they may exhibit sharp fluctuations in running time, especially for large alphabets. In this paper, we build the reference sequence selection problem and propose a method named RefSelect to quickly solve it by evaluating the number of candidate motifs for the reference sequences. RefSelect can bring a practical time improvement of the state-of-the-art pattern-driven PMS algorithms. Experimental results show that RefSelect (1) makes the tested algorithms solve the PMS problem steadily in an efficient way, (2) particularly, makes them achieve a speedup of up to about 100× on the protein data, and (3) is also suitable for large data sets which contain hundreds or more sequences.
机译:植入的(l,d)主题搜索(PMS)是计算生物学中一个重要但具有挑战性的问题。模式驱动的PMS算法通常使用t个输入序列中的k个作为参考序列来生成候选基序,并且它们可以找到输入序列中的所有(l,d)基序。但是,它们中的大多数仅将输入中的前k个序列作为参考序列,而无需进行复杂的选择过程,因此,它们的运行时间可能会出现急剧波动,尤其是对于大字母而言。在本文中,我们构建了参考序列选择问题,并提出了一种名为RefSelect的方法,通过评估参考序列的候选基序数量来快速解决该问题。 RefSelect可以带来最先进的模式驱动PMS算法的实际时间改进。实验结果表明,RefSelect(1)使所测试的算法有效地稳定地解决了PMS问题,(2)特别是使它们在蛋白质数据上达到了约100倍的加速,并且(3)也适用对于包含数百个或更多序列的大型数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号