首页> 外文期刊>Theoretical and Experimental Plant Physiology >Metric Similarity Joins Using MapReduce
【24h】

Metric Similarity Joins Using MapReduce

机译:使用MapReduce加入度量标准相似性

获取原文
获取原文并翻译 | 示例
           

摘要

Given two object sets Q and O, a metric similarity join finds similar object pairs according to a certain criterion. This operation has a wide variety of applications in data cleaning and data mining, to name but a few. However, the rapidly growing volume of data nowadays challenges traditional metric similarity join methods, and thus, a distributed method is required. In this paper, we adopt a popular distributed framework, namely, MapReduce, to support scalable metric similarity joins. To ensure the load balancing, we present two sampling based partition methods. One utilizes the pivot and the space-filling curve mappings to cluster the data into one-dimensional space, and then selects high quality centroids to enable equal-sized partitions. The other uses the KD-tree partitioning technique to equally divide the data after the pivot mapping. To avoid unnecessary object pair evaluation, we propose a framework that maps the two involved object sets in order, where the range-object filtering, the double-pivot filtering, the pivot filtering, and the plane sweeping techniques are utilized for pruning. Extensive experiments with both real and synthetic data sets demonstrate that our solutions outperform significantly existing state-of-the-art competitors.
机译:给定两个对象集Q和O,度量相似性连接根据某个标准查找类似的对象对。该操作具有各种各样的应用在数据清洁和数据挖掘中,以姓名但几个。然而,现在,现在的快速增长的数据量挑战传统的公制相似性连接方法,因此需要一种分布式方法。在本文中,我们采用了流行的分布式框架,即MapReduce,以支持可扩展的度量标准相似性连接。为确保负载平衡,我们呈现了两个基于采样的分区方法。一个人利用枢轴和空间填充曲线映射来将数据集成为一维空间,然后选择高质量的质心来实现相等大小的分区。另一个使用KD-Tree分区技术在枢轴映射之后同等地分割数据。为了避免不必要的对象对评估,我们提出了一个框架,其按顺序映射两个涉及的对象集,其中范围对象滤波,双枢轴滤波,枢轴滤波和平面扫描技术用于修剪。具有实际和合成数据集的广泛实验表明我们的解决方案优于现有的最先进的竞争对手。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号