A hash trie filter method for approximate string matching in genomic databases

Ye-In Chang; Jiun-Rung Chen; Min-Tze Hsu

首页> 外文期刊>Applied Intelligence >A hash trie filter method for approximate string matching in genomic databases

【24h】

A hash trie filter method for approximate string matching in genomic databases

机译：用于基因组数据库中近似字符串匹配的哈希特里过滤器方法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In genomic databases, approximate string matching with k errors is often applied when searching genomic sequences, where k errors can be caused by substitution, insertion, or deletion operations. In this paper, we propose a new method, the hash trie filter, to efficiently support approximate string matching in genomic databases. First, we build a hash trie for indexing the genomic sequence stored in a database in advance. Then, we utilize an efficient technique to find the ordered subpatterns in the sequence, which could reduce the number of candidates by pruning some unreasonable matching positions. Moreover, our method will dynamically decide the number of ordered matching grams, resulting in the increase of precision. The simulation results show that the hash trie filter outperforms the well-known (k+s) q-samples filter in terms of the response time, the number of verified candidates, and the precision, under different lengths of the query patterns and different error levels.

机译：在基因组数据库中，搜索基因组序列时通常会应用带有k个错误的近似字符串匹配，其中k个错误可能是由替换，插入或删除操作引起的。在本文中，我们提出了一种新的方法，即哈希Trie过滤器，可以有效地支持基因组数据库中的近似字符串匹配。首先，我们建立一个哈希索引，用于预先索引存储在数据库中的基因组序列。然后，我们利用一种有效的技术来查找序列中的有序子模式，这可以通过修剪一些不合理的匹配位置来减少候选数。此外，我们的方法将动态确定有序匹配克数，从而提高精度。仿真结果表明，在不同的查询模式长度和不同的错误率下，hash trie过滤器的响应时间，经过验证的候选数和精度均优于著名的（k + s）q样本过滤器。水平。

著录项

来源
《Applied Intelligence》 |2010年第1期|p.21-38|共18页
作者
Ye-In Chang; Jiun-Rung Chen; Min-Tze Hsu;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Approximate string matching; Filter; Genomic database; Global order; Local order;

机译：近似字符串匹配;过滤器;基因组数据库;全球秩序;当地订单;

相似文献

外文文献
中文文献
专利

1. A hash trie filter method for approximate string matching in genomic databases [J] . Chang Y.-I., Chen J.-R., Hsu M.-T. Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies . 2010,第1期

机译：用于基因组数据库中近似字符串匹配的哈希特里过滤器方法
2. Using signature hashing for approximate string matching [J] . L. M. Boitsov Computational mathematics and modeling . 2002,第3期

机译：使用签名哈希进行近似字符串匹配
3. Fast randomized approximate string matching with succinct hash data structures [J] . Alberto Policriti, Nicola Prezza BMC Bioinformatics . 2015,第SUPPLEMENTa9期

机译：快速随机近似字符串匹配，具有简洁的哈希数据结构
4. A Hash Trie Filter Approach to Approximate String Matching for Genomic Databases [C] . Ye-In Chang, Jiun-Rung Chen, Min-Tze Hsu Next-generation applied intelligence . 2009

机译：用于基因组数据库近似字符串匹配的哈希树过滤器方法
5. Multi-filter String Matching and Human-centric Entity Matching for Information Extraction. [D] . Sun, Chong. 2012

机译：用于信息提取的多过滤器字符串匹配和以人为中心的实体匹配。
6. Fast randomized approximate string matching with succinct hash data structures [O] . Alberto Policriti, Nicola Prezza 2015

机译：快速随机近似字符串匹配具有简洁的哈希数据结构
7. A Parallel Automaton String Matching with Pre-Hashing and Root-Indexing Techniques for Content Filtering Coprocessor [O] . Kuo-kun Tseng, Ying-dar Lin, Tsern-huei Lee, 2008

机译：内容过滤协处理器的预哈希和根索引技术的并行自动机字符串匹配

A hash trie filter method for approximate string matching in genomic databases

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅