...
首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >A Prefix-Filter based Method for Spatio-Textual Similarity Join
【24h】

A Prefix-Filter based Method for Spatio-Textual Similarity Join

机译:基于前缀过滤器的时空文本相似连接方法

获取原文
获取原文并翻译 | 示例
           

摘要

Location-based services have attracted significant attention due to modern mobile phones equipped with GPS devices. These services generate large amounts of spatio-textual data which contain both spatial location and textual descriptions. Since a spatio-textual object may have different representations, possibly because of deviations of GPS or different user descriptions, it calls for efficient methods to integrate spatio-textual data from different sources. In this paper we study a new research problem called spatio-textual similarity join: given two sets of spatio-textual objects, find the similar object pairs. We make the following contributions: (1) We develop a filter-and-refine framework and devise several efficient algorithms. We extend the prefix filter technique to generate spatial and textual signatures for the objects and build inverted index on top of these signatures. Then we generate candidate pairs using the inverted lists of signatures. Finally we refine the candidates and generate the final result. (2) We study how to generate high-quality signatures for spatial information. We develop an MBR-prefix based signature to prune large numbers of dissimilar object pairs. (3) We propose a hybrid signature scheme to support both textual pruning and spatial pruning simultaneously. (4) Experimental results on real and synthetic datasets show that our algorithms achieve high performance and scale well.
机译:由于配备了GPS设备的现代手机,基于位置的服务已引起了广泛的关注。这些服务生成大量的时空文本数据,其中既包含空间位置又包含文本描述。由于时空文本对象可能具有不同的表示形式,可能是由于GPS的偏差或不同的用户描述,因此它需要有效的方法来集成来自不同来源的时空文本数据。在本文中,我们研究了一个新的研究问题,即时空文本相似性联接:给定两组时空文本对象,找到相似对象对。我们做出了以下贡献:(1)我们开发了一个筛选和优化框架,并设计了几种有效的算法。我们扩展了前缀过滤器技术,以生成对象的空间和文本签名,并在这些签名之上构建反向索引。然后,我们使用签名的倒排列表生成候选对。最后,我们优化候选者并生成最终结果。 (2)我们研究如何为空间信息生成高质量的签名。我们开发了基于MBR前缀的签名,以修剪大量不同的对象对。 (3)我们提出了一种混合签名方案,以同时支持文本修剪和空间修剪。 (4)在真实数据集和合成数据集上的实验结果表明,我们的算法具有较高的性能和可扩展性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号