Efficient processing of all-k-nearest-neighbor queries in the MapReduce programming framework

Moutafis Panagiotis; Mavrommatis George; Vassilakopoulos Michael; Sioutas Spyros

首页> 外文期刊>Data & Knowledge Engineering >Efficient processing of all-k-nearest-neighbor queries in the MapReduce programming framework

【24h】

Efficient processing of all-k-nearest-neighbor queries in the MapReduce programming framework

机译：MapReduce编程框架中的All-K-interBible邻查询的高效处理

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Numerous modern applications, from social networking to astronomy, need efficient answering of queries on spatial data. One such query is the All k Nearest-Neighbor Query, or k Nearest-Neighbor Join, that takes as input two datasets and, for each object of the first one, returns the k nearest-neighbors from the second one. It is a combination of the k nearest-neighbor and join queries and is computationally demanding. Especially, when the datasets involved fall in the category of Big Data, a single machine cannot efficiently process it. Only in the last few years, papers proposing solutions for distributed computing environments have appeared in the literature. In this paper, we focus on parallel and distributed algorithms using the Apache Hadoop framework. More specifically, we focus on an algorithm that was recently presented in the literature and propose improvements to tackle three major challenges that distributed processing faces: improvement of load balancing (we implement an adaptive partitioning scheme based on Quadtrees), acceleration of local processing (we prune points during calculations by utilizing plane-sweep processing), and reduction of network traffic (we restructure and reduce the output size of the most demanding phase of computation). Moreover, by using real 2D and 3D datasets, we experimentally study the effect of each improvement and their combinations on performance of this literature algorithm. Experiments show that by carefully addressing the three aforementioned issues, one can achieve significantly better performance. Thereby, we conclude to a new scalable algorithm that adapts to the data distribution and significantly outperforms its predecessor. Moreover, we present an experimental comparison of our algorithm against other well-known MapReduce algorithms for the same query and show that these algorithms are also significantly outperformed.

机译：从社交网络到天文学的许多现代应用需要在空间数据上有效地应答查询。一个这样的查询是所有K最近邻查询，或k最近邻居连接，它作为输入两个数据集，并且对于第一个数据集，并且对于第一个的每个对象，返回来自第二个的k最近邻居。它是k最近邻居的组合，并加入查询，并且是计算要求的。特别是，当数据集涉及到大数据类别时，单个机器无法有效地处理它。只在过去几年中，文献中出现了提出用于分布式计算环境的解决方案的论文。在本文中，我们专注于使用Apache Hadoop框架的并行和分布式算法。更具体地说，我们专注于最近在文献中呈现的算法，并提出改进来解决分布式处理面的三个主要挑战：负载平衡的改进（我们基于四仲群地实现了基于四分之一的自适应分区方案），加速本地处理（我们通过利用平面扫描处理来计算计算期间的修剪点），并减少网络流量（我们重组并减少最苛刻的计算阶段的输出大小）。此外，通过使用真实的2D和3D数据集，我们通过实验研究每个改进的效果及其组合对该文献算法性能的影响。实验表明，通过仔细解决三个上述问题，人们可以实现明显更好的性能。因此，我们结束了一种新的可扩展算法，适应数据分布并显着优于其前身。此外，我们向其他众所周知的MapReduce算法提供了对同一查询的实验比较，并表明这些算法也显着优于表现。

著录项

来源
《Data & Knowledge Engineering》 |2019年第5期|42-70|共29页
作者
Moutafis Panagiotis; Mavrommatis George; Vassilakopoulos Michael; Sioutas Spyros;
展开▼
作者单位

Univ Thessaly Dept Elect & Comp Engn Data Structuring & Engn Lab Volos Greece;

Univ Thessaly Dept Elect & Comp Engn Data Structuring & Engn Lab Volos Greece;

Univ Thessaly Dept Elect & Comp Engn Data Structuring & Engn Lab Volos Greece;

Univ Patras Dept Comp Engn & Informat Patras Greece;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Spatial query processing; Nearest neighbor query; Plane sweep; Quadtrees; MapReduce; Apache hadoop;

机译：空间查询处理;最近的邻居查询;平面扫描;Quadtrees;MapReduce;Apache Hadoop;

相似文献

外文文献
中文文献
专利

1. Efficient processing of all-k-nearest-neighbor queries in the MapReduce programming framework [J] . Moutafis Panagiotis, Mavrommatis George, Vassilakopoulos Michael, Data & Knowledge Engineering . 2019,第MAY期

机译：在MapReduce编程框架中高效处理所有k个最近邻查询
2. An efficient parallel processing method for skyline queries in MapReduce [J] . Kim Junsu, Kim Myoung Ho Journal of supercomputing . 2018,第2期

机译：MapReduce中用于天际线查询的高效并行处理方法
3. Efficient Processing of Skyline Queries Using MapReduce [J] . Yoonjae Park, Jun-Ki Min, Kyuseok Shim IEEE Transactions on Knowledge and Data Engineering . 2017,第5期

机译：使用MapReduce高效处理天际线查询
4. Efficient Processing of Area Skyline Query in MapReduce Framework [C] . Zakia Zinat Choudhury, Asif Zaman, Md. Ekramul Hamid IEEE International WIE Conference on Electrical and Computer Engineering . 2018

机译：MapReduce框架中区域天际线查询的高效处理
5. Investigating MapReduce framework extensions for efficient processing of geographically scattered datasets. [D] . Gadre, Hrishikesh. 2011

机译：研究MapReduce框架扩展，以有效处理地理上分散的数据集。
6. Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends [O] . Emad A Mohammed, Behrouz H Far, Christopher Naugler 2014

机译：MapReduce编程框架在临床大数据分析中的应用：当前形势和未来趋势
7. Grex: An Efficient MapReduce Framework for Graphics Processing Units [O] . Can Basaran, Kyoung-don Kang 2013

机译：Grex：图形处理单元的高效MapReduce框架
8. Interactive Query Processing in Big Data Systems: A Cross Industry Study of MapReduce Workloads. [R] . R. H. Katz S. Alspaugh Y. Chen 2012

机译：大数据系统中的交互式查询处理：mapReduce工作负载的跨行业研究。

Efficient processing of all-k-nearest-neighbor queries in the MapReduce programming framework

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅