首页> 外文会议>International computer science and engineering conference >An index structure for similarity join based on high-frequency queries
【24h】

An index structure for similarity join based on high-frequency queries

机译:基于高频查询的相似联接的索引结构

获取原文
获取外文期刊封面目录资料

摘要

Strings databases are widely used in many applications these days. Searching for texts which are similar to query texts is necessary. Similarity join finds pairs of texts whose similarity exceeds a given threshold. Many researches have been done to reduce the time for similarity join. The filter-and-verify framework is one approach which first filters out dissimilar pairs of text and then verifies the remaining pairs. Prefix filtering is a filter-and-verify method which eliminates dissimilar pairs of texts by comparing only prefixes of the texts. However, these algorithms for similarity join disregard the frequencies of queries. Based on the data collected from Google trends explorer, some queries appear with higher frequency. This paper aims to reduce the running time for similarity join by focusing on these high-frequency queries. Based on these high-frequency queries, indices are created to facilitate these queries and any queries which are similar to them. The proposed indices and similarity join algorithm are implemented to evaluate its performance. Experiments show that the proposed method outperforms a leading similarity join algorithm - AdaptSearch - when queries are similar to a high-frequency query.
机译:如今,字符串数据库已广泛用于许多应用程序中。搜索与查询文本相似的文本是必要的。相似连接查找相似度超过给定阈值的文本对。为了减少相似连接的时间,已经进行了许多研究。过滤和验证框架是一种方法,它首先过滤掉不相似的文本对,然后验证其余的文本对。前缀过滤是一种过滤验证方法,通过仅比较文本的前缀来消除不相似的文本对。但是,这些用于相似性的算法会忽略查询的频率。根据从Google趋势浏览器收集的数据,某些查询的出现频率更高。本文旨在通过关注这些高频查询来减少相似性联接的运行时间。基于这些高频查询,创建索引以促进这些查询以及与它们类似的任何查询。实现了所提出的索引和相似性联接算法以评估其性能。实验表明,当查询与高频查询相似时,该方法优于领先的相似性连接算法AdaptSearch。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号