首页> 外文会议>International Symposium on Computer and Information Sciences >Effective Early Termination Techniques for Text Similarity Join Operator
【24h】

Effective Early Termination Techniques for Text Similarity Join Operator

机译:文本相似性JOIN运算符的有效早期终止技术

获取原文

摘要

Text similarity join operator joins two relations if their join attributes are textually similar to each other, and it has a variety of application domains including integration and querying of data from heterogeneous resources; cleansing of data; and mining of data. Although, the text similarity join operator is widely used, its processing is expensive due to the huge number of similarity computations performed. In this paper, we incorporate some short cut evaluation techniques from the Information Retrieval domain, namely Harman, quit, continue, and maximal similarity filter heuristics, into the previously proposed text similarity join algorithms to reduce the amount of similarity computations needed during the join operation. We experimentally evaluate the original and the heuristic based similarity join algorithms using real data obtained from the DBLP Bibliography database, and observe performance improvements with continue and maximal similarity filter heuristics.
机译:文本相似性JOIN运算符如果其加入属性彼此复杂类似,并且它具有各种应用域,包括集成和查询来自异构资源的数据;清洗数据;和挖掘数据。虽然,文本相似度加入运算符被广泛使用,但由于所执行的相似性计算数量的大量相似性计算,其处理很昂贵。在本文中,我们将一些短截止评估技术从信息检索域中融合,即HARMAN,QUIT,继续和最大相似性过滤器启发式,进入以前提出的文本相似性连接算法,以减少加入操作期间所需的相似性计算量。我们使用从DBLP参考书目数据库获得的真实数据进行实验评估原始和启发式的相似度Join算法,并观察继续和最大相似性滤波器启发式的性能改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号