首页> 外文会议>International Conference on Advanced Computational Intelligence >Hamming distance based approximate similarity text search algorithm
【24h】

Hamming distance based approximate similarity text search algorithm

机译:基于汉明距离的近似相似文本搜索算法

获取原文

摘要

We propose a Hamming distance based approximate similarity text search (HASTS) algorithm to improve the quality of queries in massive text data. The HASTS algorithm first constructs an index table with the substrings extracted randomly from the feature fingerprints generated by the SimHash algorithm. Then, it assigns weights to text terms to reduce the size of the candidate set. Furthermore, the final query result can be obtained by comparing the Hamming distance between the query term and the text terms in the candidate set. Finally, Extensive simulations are conducted to analysis the influence of different parameters on query performance of the HASTS algorithm and compare its performance with the existing search algorithm. The results show that the HASTS algorithm can satisfy the query requirements in massive text data with relatively low overheads.
机译:我们提出了一种基于汉明距离的近似相似性文本搜索(Hasts)算法,可以提高大规模文本数据中查询的质量。 Hasts算法首先构造具有由SimHash算法生成的特征指纹随机提取的子序列的索引表。然后,它为文本术语分配权重以减小候选集的大小。此外,可以通过将查询项与候选集中的文本术语进行比较来获得最终查询结果。最后,进行了广泛的模拟,以分析不同参数对机器算法查询性能的影响,并将其与现有搜索算法进行比较。结果表明,Hasts算法可以满足具有相对低开销的大规模文本数据中的查询要求。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号