Efficient processing of all-k-nearest-neighbor queries in the MapReduce programming framework

Moutafis Panagiotis; Mavrommatis George; Vassilakopoulos Michael; Sioutas Spyros

首页> 外文期刊>Data & Knowledge Engineering >Efficient processing of all-k-nearest-neighbor queries in the MapReduce programming framework

【24h】

Efficient processing of all-k-nearest-neighbor queries in the MapReduce programming framework

机译：在MapReduce编程框架中高效处理所有k个最近邻查询

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Numerous modern applications, from social networking to astronomy, need efficient answering of queries on spatial data. One such query is the All k Nearest-Neighbor Query, or k Nearest-Neighbor Join, that takes as input two datasets and, for each object of the first one, returns the k nearest-neighbors from the second one. It is a combination of the k nearest-neighbor and join queries and is computationally demanding. Especially, when the datasets involved fall in the category of Big Data, a single machine cannot efficiently process it. Only in the last few years, papers proposing solutions for distributed computing environments have appeared in the literature. In this paper, we focus on parallel and distributed algorithms using the Apache Hadoop framework. More specifically, we focus on an algorithm that was recently presented in the literature and propose improvements to tackle three major challenges that distributed processing faces: improvement of load balancing (we implement an adaptive partitioning scheme based on Quadtrees), acceleration of local processing (we prune points during calculations by utilizing plane-sweep processing), and reduction of network traffic (we restructure and reduce the output size of the most demanding phase of computation). Moreover, by using real 2D and 3D datasets, we experimentally study the effect of each improvement and their combinations on performance of this literature algorithm. Experiments show that by carefully addressing the three aforementioned issues, one can achieve significantly better performance. Thereby, we conclude to a new scalable algorithm that adapts to the data distribution and significantly outperforms its predecessor. Moreover, we present an experimental comparison of our algorithm against other well-known MapReduce algorithms for the same query and show that these algorithms are also significantly outperformed.

机译：从社交网络到天文学，许多现代应用程序都需要对空间数据查询的有效回答。这样的查询就是“全部k个最近邻居查询”或“ k个最近邻居联接”，它以两个数据集作为输入，对于第一个对象的每个对象，返回第二个数据集的k个最近邻居。它是k个最邻近查询和联接查询的组合，并且对计算的要求很高。特别是，当涉及的数据集属于大数据类别时，一台机器无法有效地对其进行处理。仅在最近几年中，有关分布式计算环境解决方案的论文才出现在文献中。在本文中，我们重点介绍使用Apache Hadoop框架的并行和分布式算法。更具体地说，我们专注于最近在文献中提出的算法，并提出改进措施以解决分布式处理面临的三个主要挑战：负载平衡的改进（我们实现了基于四叉树的自适应分区方案），本地处理的加速（我们通过利用平面扫描处理在计算过程中修剪点，并减少网络流量（我们调整结构并减少最苛刻计算阶段的输出大小）。此外，通过使用真实的2D和3D数据集，我们实验研究了每种改进及其组合对该文献算法性能的影响。实验表明，通过仔细解决上述三个问题，可以显着提高性能。因此，我们得出了一种新的可伸缩算法，该算法可适应数据分布并明显优于其前身。此外，对于相同的查询，我们将我们的算法与其他著名的MapReduce算法进行实验比较，并表明这些算法的性能也明显优于其他算法。

著录项

来源
《Data & Knowledge Engineering》 |2019年第5期|42-70|共29页
作者
Moutafis Panagiotis; Mavrommatis George; Vassilakopoulos Michael; Sioutas Spyros;
展开▼
作者单位

Univ Thessaly, Dept Elect & Comp Engn, Data Structuring & Engn Lab, Volos, Greece;

Univ Thessaly, Dept Elect & Comp Engn, Data Structuring & Engn Lab, Volos, Greece;

Univ Thessaly, Dept Elect & Comp Engn, Data Structuring & Engn Lab, Volos, Greece;

Univ Patras, Dept Comp Engn & Informat, Patras, Greece;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Spatial query processing; Nearest neighbor query; Plane sweep; Quadtrees; MapReduce; Apache hadoop;

机译：空间查询处理;最近邻查询;平面扫描;四叉树;MapReduce;Apache hadoop;

相似文献

外文文献
中文文献
专利

1. Efficient processing of all-k-nearest-neighbor queries in the MapReduce programming framework [J] . Moutafis Panagiotis, Mavrommatis George, Vassilakopoulos Michael, Data & Knowledge Engineering . 2019,第May期

机译：MapReduce编程框架中的All-K-interBible邻查询的高效处理
2. An efficient parallel processing method for skyline queries in MapReduce [J] . Kim Junsu, Kim Myoung Ho Journal of supercomputing . 2018,第2期

机译：MapReduce中用于天际线查询的高效并行处理方法
3. Efficient Processing of Skyline Queries Using MapReduce [J] . Yoonjae Park, Jun-Ki Min, Kyuseok Shim IEEE Transactions on Knowledge and Data Engineering . 2017,第5期

机译：使用MapReduce高效处理天际线查询
4. Efficient Processing of Area Skyline Query in MapReduce Framework [C] . Zakia Zinat Choudhury, Asif Zaman, Md. Ekramul Hamid IEEE International WIE Conference on Electrical and Computer Engineering . 2018

机译：MapReduce框架中区域天际线查询的高效处理
5. Investigating MapReduce framework extensions for efficient processing of geographically scattered datasets. [D] . Gadre, Hrishikesh. 2011

机译：研究MapReduce框架扩展，以有效处理地理上分散的数据集。
6. Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends [O] . Emad A Mohammed, Behrouz H Far, Christopher Naugler 2014

机译：MapReduce编程框架在临床大数据分析中的应用：当前形势和未来趋势
7. Grex: An Efficient MapReduce Framework for Graphics Processing Units [O] . Can Basaran, Kyoung-don Kang 2013

机译：Grex：图形处理单元的高效MapReduce框架
8. Interactive Query Processing in Big Data Systems: A Cross Industry Study of MapReduce Workloads. [R] . R. H. Katz S. Alspaugh Y. Chen 2012

机译：大数据系统中的交互式查询处理：mapReduce工作负载的跨行业研究。

Efficient processing of all-k-nearest-neighbor queries in the MapReduce programming framework

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅