首页> 外文会议>International Conference on Bioinformatics and Computational Biology >Quality-Based Similarity Search for Biological Sequence Databases
【24h】

Quality-Based Similarity Search for Biological Sequence Databases

机译:基于质量的相似性搜索生物序列数据库

获取原文

摘要

Low-Complexity Regions (LCRs) of biological sequences are the main source of false positives in similarity searches for such sequence databases. Identifying LCRs in a sequence is a difficult task. Existing tools for identifying LCRs incur large amounts of false positives and false negatives. We consider the problem of finding similar sequences when LCRs are not located precisely. We develop an LCR-based formulation to measure the quality of each letter in a sequence. We show that the quality values can be employed in two fundamental approaches to the sequence search problem to reduce the number of false positives produced by them significantly. The former finds optimal alignments and the latter computes a suboptimal alignment. For the latter one, we also develop a randomized memory-resident hash table that indexes k-grams probabilistically. As a result, memory usage and CPU cost are greatly reduced. We also show that this hash table can be used to reconstruct query sequences with negligible information loss. This eliminates the need to store these sequences. Our experiments on real data show that our quality-based similarity search algorithms reduce the number of false positives drastically. In addition, their running times were better than the existing strategies.
机译:生物序列的低复杂性区域(LCRS)是对这种序列数据库的相似性搜索中的误报的主要来源。在序列中识别LCR是一项艰巨的任务。用于识别LCR的现有工具会产生大量的误报和错误的否定。我们考虑当LCRS不正确地定位时找到类似序列的问题。我们开发了基于LCR的配方,以测量序列中每个字母的质量。我们表明,质量值可以用两个基本方法采用序列搜索问题,以减少它们产生的误报的数量。前者发现最佳对齐,后者计算了次优对齐。对于后者,我们还开发了一个随机的内存居民哈希表,该表索引概率索引k-grams。结果,大大减少了内存使用和CPU成本。我们还表明,此哈希表可用于重建具有可忽略的信息丢失查询序列。这消除了存储这些序列的需要。我们对实际数据的实验表明,我们的质量基于相似性搜索算法急剧下降减少了误报的数量。此外,他们的运行时间比现有的策略更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号