首页> 外文会议>IEEE International Conference on Bioinformatics and Biomedicine >Reference sequence selection for motif searches
【24h】

Reference sequence selection for motif searches

机译:主题搜索的参考序列选择

获取原文

摘要

The planted (l, d) motif search (PMS) is an important yet challenging problem in computational biology. Pattern-driven PMS algorithms usually use k out of t input sequences as reference sequences to generate candidate motifs, and they can find all the (l, d) motifs in the input sequences. However, most of them simply take the first k sequences in the input as reference sequences without elaborate selection processes, and thus they may exhibit sharp fluctuations in running time, especially for large alphabets. In this paper, we build the reference sequence selection problem and propose a method named RefSelect to quickly solve it by evaluating the number of candidate motifs for the reference sequences. RefSelect can bring a practical time improvement of the state-of-the-art pattern-driven PMS algorithms. Experimental results show that RefSelect (1) makes the tested algorithms solve the PMS problem steadily in an efficient way, (2) particularly, makes them achieve a speedup of up to about 100× on the protein data, and (3) is also suitable for large data sets which contain hundreds or more sequences.
机译:种植(L,D)图案搜索(PMS)是计算生物学中的一个重要而有挑战性问题。模式驱动的PMS算法通常使用K OUT输入序列作为用于生成候选图案的参考序列,并且它们可以在输入序列中找到所有(L,D)图案。然而,大多数人只是在输入中将第一k序列作为参考序列,而没有精确选择过程,因此它们可能在运行时间内表现出剧烈的波动,特别是对于大字母。在本文中,我们构建了参考序列选择问题,并提出了一种名为Refselect的方法来快速解决参考序列的候选主题的数量。 Refelect可以带来最先进的模式驱动PM算法的实际时间。实验结果表明,Refsectect(1)使测试算法以有效的方式稳定地解决PMS问题,(2)特别地,使它们在蛋白质数据上实现高达约100倍的加速,并且(3)也是合适的对于包含数百或更多序列的大数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号