首页> 外文期刊>Applied Intelligence >A hash trie filter method for approximate string matching in genomic databases
【24h】

A hash trie filter method for approximate string matching in genomic databases

机译:用于基因组数据库中近似字符串匹配的哈希特里过滤器方法

获取原文
获取原文并翻译 | 示例

摘要

In genomic databases, approximate string matching with k errors is often applied when searching genomic sequences, where k errors can be caused by substitution, insertion, or deletion operations. In this paper, we propose a new method, the hash trie filter, to efficiently support approximate string matching in genomic databases. First, we build a hash trie for indexing the genomic sequence stored in a database in advance. Then, we utilize an efficient technique to find the ordered subpatterns in the sequence, which could reduce the number of candidates by pruning some unreasonable matching positions. Moreover, our method will dynamically decide the number of ordered matching grams, resulting in the increase of precision. The simulation results show that the hash trie filter outperforms the well-known (k+s) q-samples filter in terms of the response time, the number of verified candidates, and the precision, under different lengths of the query patterns and different error levels.
机译:在基因组数据库中,搜索基因组序列时通常会应用带有k个错误的近似字符串匹配,其中k个错误可能是由替换,插入或删除操作引起的。在本文中,我们提出了一种新的方法,即哈希Trie过滤器,可以有效地支持基因组数据库中的近似字符串匹配。首先,我们建立一个哈希索引,用于预先索引存储在数据库中的基因组序列。然后,我们利用一种有效的技术来查找序列中的有序子模式,这可以通过修剪一些不合理的匹配位置来减少候选数。此外,我们的方法将动态确定有序匹配克数,从而提高精度。仿真结果表明,在不同的查询模式长度和不同的错误率下,hash trie过滤器的响应时间,经过验证的候选数和精度均优于著名的(k + s)q样本过滤器。水平。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号