Hamming distance based approximate similarity text search algorithm

机译：基于汉明距离的近似相似文本搜索算法

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We propose a Hamming distance based approximate similarity text search (HASTS) algorithm to improve the quality of queries in massive text data. The HASTS algorithm first constructs an index table with the substrings extracted randomly from the feature fingerprints generated by the SimHash algorithm. Then, it assigns weights to text terms to reduce the size of the candidate set. Furthermore, the final query result can be obtained by comparing the Hamming distance between the query term and the text terms in the candidate set. Finally, Extensive simulations are conducted to analysis the influence of different parameters on query performance of the HASTS algorithm and compare its performance with the existing search algorithm. The results show that the HASTS algorithm can satisfy the query requirements in massive text data with relatively low overheads.

机译：我们提出了一种基于汉明距离的近似相似性文本搜索（Hasts）算法，可以提高大规模文本数据中查询的质量。 Hasts算法首先构造具有由SimHash算法生成的特征指纹随机提取的子序列的索引表。然后，它为文本术语分配权重以减小候选集的大小。此外，可以通过将查询项与候选集中的文本术语进行比较来获得最终查询结果。最后，进行了广泛的模拟，以分析不同参数对机器算法查询性能的影响，并将其与现有搜索算法进行比较。结果表明，Hasts算法可以满足具有相对低开销的大规模文本数据中的查询要求。

著录项

来源
《International Conference on Advanced Computational Intelligence》|2015年||共6页
会议地点
作者
Haifeng Hu; Liang Zhang; Jianshen Wu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类其他计算机;
关键词

相似文献

外文文献
中文文献
专利

1. A Simple Algorithm for Approximating the Text-To-Pattern Hamming Distance [J] . Tsvi Kopelowitz, Ely Porat OASIcs : OpenAccess Series in Informatics . 2018,第4期

机译：近似文本到汉明距离的简单算法
2. Algorithms for all-pairs Hamming distance based similarity [J] . Grabowski Szymon, Kowalski Tomasz M. Software, practice & experience . 2021,第7期

机译：基于汉明距离的全对山柱的算法
3. New algorithms for fixed-length approximate string matching and approximate circular string matching under the Hamming distance [J] . Ho ThienLuan, Oh Seung-Rohk, Kim HyunJin Journal of supercomputing . 2018,第5期

机译：海明距离下定长近似字符串匹配和近似圆字符串匹配的新算法
4. Hamming distance based approximate similarity text search algorithm [C] . Haifeng Hu, Liang Zhang, Jianshen Wu International Conference on Advanced Computational Intelligence . 2015

机译：基于汉明距离的近似相似度文本搜索算法
5. Adaptive measures of similarity---Fuzzy Hamming distance---and its applications to pattern recognition problems. [D] . Ionescu, Mircea M. 2006

机译：相似度的自适应度量-模糊汉明距离-及其在模式识别问题中的应用
6. Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns [O] . Fernando Meyer, Stefan Kurtz, Michael Beckstette 2013

机译：快速的在线和基于索引的算法用于RNA序列结构模式的近似搜索
7. Correction to: New algorithms for fixed-length approximate string matching and approximate circular string matching under the Hamming distance [O] . ThienLuan Ho, Seung-Rohk Oh, HyunJin Kim 2018

机译：校正：用于在汉明距离下匹配的固定长度近似串匹配和近似圆形串的新算法

Hamming distance based approximate similarity text search algorithm

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅