首页> 外文会议>International Conference on Advanced Computational Intelligence >Hamming distance based approximate similarity text search algorithm
【24h】

Hamming distance based approximate similarity text search algorithm

机译:基于汉明距离的近似相似度文本搜索算法

获取原文

摘要

We propose a Hamming distance based approximate similarity text search (HASTS) algorithm to improve the quality of queries in massive text data. The HASTS algorithm first constructs an index table with the substrings extracted randomly from the feature fingerprints generated by the SimHash algorithm. Then, it assigns weights to text terms to reduce the size of the candidate set. Furthermore, the final query result can be obtained by comparing the Hamming distance between the query term and the text terms in the candidate set. Finally, Extensive simulations are conducted to analysis the influence of different parameters on query performance of the HASTS algorithm and compare its performance with the existing search algorithm. The results show that the HASTS algorithm can satisfy the query requirements in massive text data with relatively low overheads.
机译:我们提出了一种基于汉明距离的近似相似文本搜索(HASTS)算法,以提高海量文本数据中查询的质量。 HASTS算法首先使用从SimHash算法生成的特征指纹中随机提取的子字符串构建索引表。然后,它将权重分配给文本项以减小候选集的大小。此外,可以通过比较候选集中查询词和文本词之间的汉明距离来获得最终查询结果。最后,进行了广泛的仿真,以分析不同参数对HASTS算法查询性能的影响,并将其性能与现有搜索算法进行比较。结果表明,HASTS算法能够以较低的开销满足海量文本数据中的查询需求。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号