【24h】

Designing multiple simultaneous seeds for DNA similarity search

机译:设计多个同时种子以进行DNA相似性搜索

获取原文

摘要

The challenge of similarity search in massive DNA sequence databases has inspired major changes in BLAST-style alignment tools, which accelerate search by inspecting only pairs of sequences sharing a common short "seed," or pattern of matching residues. Some of these changes raise the possibility of improving search performance by probing sequence pairs with several distinct seeds, any one of which is sufficient for a seed match. However, designing a set of seeds to maximize their combined sensitivity to biologically meaningful sequence alignments is computationally difficult, even given recent advances [16, 6] in designing single seeds.This work describes algorithmic improvements to seed design that address the problem of designing a set of n seeds to be used simultaneously. We give a new local search method to optimize the sensitivity of seed sets. The method relies on efficient incremental computation of the probability that an alignment contains a match to a seed π, given that it has already failed to match any of the seeds in a set π. We demonstrate experimentally that multi-seed designs, even with relatively few seeds, can be significantly more sensitive than even optimized single-seed designs.
机译:大规模DNA序列数据库中相似性搜索的挑战激发了BLAST式比对工具的重大变化,该工具通过仅检查共享共同短“种子”或匹配残基模式的一对序列来加快搜索速度。这些变化中的一些变化通过用几个不同的种子探测序列对来提高搜索性能的可能性,其中任何一个种子都足以进行种子匹配。然而,即使考虑到最近在设计单粒种子方面的进展[16,6],设计一套种子以最大化其对生物学上有意义的序列比对的组合敏感性在计算上也很困难。这项工作描述了种子设计的算法改进,解决了设计种子的问题。一组同时使用的 n 个种子。我们提供了一种新的局部搜索方法来优化种子集的敏感性。该方法依赖对齐方式包含与种子π的匹配的概率的有效增量计算,因为该对齐已无法匹配集合π中的任何种子。我们通过实验证明,即使种子相对较少,多种子设计也比优化的单种子设计更为敏感。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号