【24h】

Reference-Based Indexing of Sequence Databases

机译:基于参考的序列数据库索引

获取原文
获取原文并翻译 | 示例

摘要

We consider the problem of similarity search in a very large sequence database with edit distance as the similarity measure. Given limited main memory, our goal is to develop a reference-based index that reduces the number of costly edit distance computations in order to answer a query. The idea in reference-based indexing is to select a small set of reference sequences that serve as a surrogate for the other sequences in the database. We consider two novel strategies for selecting references as well as a new strategy for assigning references to database sequences. Our experimental results show that our selection and assignment methods far outperform competitive methods. For example, our methods prune up to 20 times as many sequences as the Omni method, and as many as 30 times as many sequences as frequency vectors. Our methods also scale nicely for databases containing many and/or very long sequences.
机译:我们认为在具有编辑距离的超大型序列数据库中的相似性搜索问题是相似性度量。给定有限的主内存,我们的目标是开发一种基于参考的索引,以减少为了回答查询而进行的昂贵的编辑距离计算的数量。基于引用的索引的思想是选择一小组参考序列,以用作数据库中其他序列的替代。我们考虑了两种选择引用的新颖策略,以及一种将引用分配给数据库序列的新策略。我们的实验结果表明,我们的选择和分配方法远胜于竞争方法。例如,我们的方法修剪的序列是Omni方法的多达20倍,而序列的修剪是频率向量的30倍。对于包含许多和/或非常长的序列的数据库,我们的方法也可以很好地扩展。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号