Hamming distance based approximate similarity text search algorithm

机译：基于汉明距离的近似相似度文本搜索算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose a Hamming distance based approximate similarity text search (HASTS) algorithm to improve the quality of queries in massive text data. The HASTS algorithm first constructs an index table with the substrings extracted randomly from the feature fingerprints generated by the SimHash algorithm. Then, it assigns weights to text terms to reduce the size of the candidate set. Furthermore, the final query result can be obtained by comparing the Hamming distance between the query term and the text terms in the candidate set. Finally, Extensive simulations are conducted to analysis the influence of different parameters on query performance of the HASTS algorithm and compare its performance with the existing search algorithm. The results show that the HASTS algorithm can satisfy the query requirements in massive text data with relatively low overheads.

机译：我们提出了一种基于汉明距离的近似相似文本搜索（HASTS）算法，以提高海量文本数据中查询的质量。 HASTS算法首先使用从SimHash算法生成的特征指纹中随机提取的子字符串构建索引表。然后，它将权重分配给文本项以减小候选集的大小。此外，可以通过比较候选集中查询词和文本词之间的汉明距离来获得最终查询结果。最后，进行了广泛的仿真，以分析不同参数对HASTS算法查询性能的影响，并将其性能与现有搜索算法进行比较。结果表明，HASTS算法能够以较低的开销满足海量文本数据中的查询需求。

著录项

来源
《International Conference on Advanced Computational Intelligence》|2015年|1-6|共6页
会议地点
作者
Haifeng Hu; Liang Zhang; Jianshen Wu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A Simple Algorithm for Approximating the Text-To-Pattern Hamming Distance [J] . Tsvi Kopelowitz, Ely Porat OASIcs : OpenAccess Series in Informatics . 2018,第4期

机译：近似文本到汉明距离的简单算法
2. Algorithms for all-pairs Hamming distance based similarity [J] . Grabowski Szymon, Kowalski Tomasz M. Software, practice & experience . 2021,第7期

机译：基于汉明距离的全对山柱的算法
3. New algorithms for fixed-length approximate string matching and approximate circular string matching under the Hamming distance [J] . Ho ThienLuan, Oh Seung-Rohk, Kim HyunJin Journal of supercomputing . 2018,第5期

机译：海明距离下定长近似字符串匹配和近似圆字符串匹配的新算法
4. Hamming distance based approximate similarity text search algorithm [C] . Haifeng Hu, Liang Zhang, Jianshen Wu International Conference on Advanced Computational Intelligence . 2015

机译：基于汉明距离的近似相似文本搜索算法
5. Adaptive measures of similarity---Fuzzy Hamming distance---and its applications to pattern recognition problems. [D] . Ionescu, Mircea M. 2006

机译：相似度的自适应度量-模糊汉明距离-及其在模式识别问题中的应用
6. Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns [O] . Fernando Meyer, Stefan Kurtz, Michael Beckstette 2013

机译：快速的在线和基于索引的算法用于RNA序列结构模式的近似搜索
7. Correction to: New algorithms for fixed-length approximate string matching and approximate circular string matching under the Hamming distance [O] . ThienLuan Ho, Seung-Rohk Oh, HyunJin Kim 2018

机译：校正：用于在汉明距离下匹配的固定长度近似串匹配和近似圆形串的新算法

Hamming distance based approximate similarity text search algorithm

摘要

著录项

相似文献

相关主题

期刊订阅