首页> 外文期刊>Distributed and Parallel Databases >Algorithms for processing the group K nearest-neighbor query on distributed frameworks
【24h】

Algorithms for processing the group K nearest-neighbor query on distributed frameworks

机译:用于处理分布式框架上的K最近邻查询的算法

获取原文
获取外文期刊封面目录资料

摘要

Given two datasets of points (called Query and Training), the Group (K) Nearest-Neighbor (GKNN) query retrieves (K) points of the Training with the smallest sum of distances to every point of the Query. This spatial query has been studied during the recent years and several performance improving techniques and pruning heuristics have been proposed. In previous work, we presented the first MapReduce algorithm, consisting of alternating local and parallel phases, which can be used to effectively process the GKNN query when the Query fits in memory, while the Training one belongs to the Big Data category. In this paper, we present a significantly improved algorithm that incorporates a new high-performance refining method, a fast way to calculate distance sums for pruning purposes and several other minor coding and algorithmic improvements. Moreover, we transform this algorithm (which has been implemented in the Hadoop framework) to SpatialHadoop (a popular distributed framework that is dedicated to spatial processing), using a novel two-level partitioning method. Using real world and synthetic datasets, we also present a thorough experimental study of the Hadoop and SpatialHadoop versions of the algorithm, including a backstage analysis of the algorithm's performance, using metrics that highlight its internal functioning. Finally, we present an experimental comparison of the Hadoop, the SpatialHadoop versions and the version of our previous work, showing that the improved versions are the big winners, with the SpatialHadoop one being faster than its Hadoop counterpart.
机译:给定两个点数据集(称为查询和训练),群组(k)最近邻(Gknn)查询用最小的距离和查询的每个点的最小距离检索(k)点。该空间查询已经在近年来研究过,并提出了几种性能提高技术和修剪启发式。在以前的工作中,我们介绍了第一个MapReduce算法,包括交替的本地和并行阶段,该阶段可以用于当查询适合内存时有效地处理GKNN查询,而训练属于大数据类别。在本文中,我们提出了一种显着改进的算法,该算法包含一种新的高性能精炼方法,一种快速方式来计算修剪目的的距离和其他次要编码和算法改进。此外,我们使用新颖的双级分区方法将该算法(已在Hadoop框架中实现的流行分布式框架)转换为SpatialHadoop(一种专用于空间处理的流行分布式框架)。使用现实世界和合成数据集,我们还展示了算法的Hadoop和Spatialhadoop版本的彻底实验研究,包括使用突出显示其内部功能的度量的算法性能的后台分析。最后,我们展示了Hadoop,Spatialhadoop版本和我们以前的工作版本的实验比较,表明改进版本是大型赢家,SpatialHadoop一个比Hadoop对手更快。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号