首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Heads-Join: Efficient Earth Mover's Distance Similarity Joins on Hadoop
【24h】

Heads-Join: Efficient Earth Mover's Distance Similarity Joins on Hadoop

机译: Heads-Join:高效的地球移动器距离相似性加入Hadoop

获取原文
获取原文并翻译 | 示例

摘要

The Earth Mover's Distance (EMD) similarity join has a number of important applications such as near duplicate image retrieval and distributed based pattern analysis. However, the computational cost of EMD is super cubic and consequently the EMD similarity join operation is prohibitive for datasets of even medium size. We propose to employ the Hadoop platform to speed up the operation. Simply porting the state-of-the-art metric distance similarity join algorithms to Hadoop results in inefficiency because they involve excessive distance computations and are vulnerable to skewed data distributions. We propose a novel framework, named , which transforms data into the space of EMD lower bounds and performs pruning and partitioning at a low cost because computing these EMD lower bounds has constant or linear complexity. We investigate both range and top- joins, and design efficient algorithms on three popular Hadoop computation paradigms, i.e., MapReduce, Bulk Synchronous Parallel, and Spark. We conduct extensive experiments on both real and synthetic datasets. The results show that outperforms the state-of-the-art metric similarity join technique, i.e., Quickjoin, by up to an order of magnitude and scales out well.
机译:地球移动者的距离(EMD)相似性联接具有许多重要的应用程序,例如几乎重复的图像检索和基于分布式的模式分析。但是,EMD的计算成本非常高,因此,即使中等大小的数据集,EMD相似性联接操作也无法实现。我们建议使用Hadoop平台来加快操作速度。将最新的度量标准距离相似性联接算法简单地移植到Hadoop会导致效率低下,因为它们涉及到过多的距离计算,并且容易受到歪斜的数据分布的影响。我们提出了一个名为的新颖框架,该框架将数据转换到EMD下限空间并以低成本执行修剪和分区,因为计算这些EMD下限具有恒定或线性的复杂性。我们研究了范围联接和顶级联接,并针对三种流行的Hadoop计算范例(即MapReduce,批量同步并行和Spark)设计高效的算法。我们对真实和合成数据集都进行了广泛的实验。结果表明,与最新的度量相似性联接技术(即Quickjoin)相比,其性能提高了一个数量级,并且可以很好地扩展。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号