首页> 外文会议>Advances in databases and information systems >A Comparison of Distributed Spatial Data Management Systems for Processing Distance Join Queries
【24h】

A Comparison of Distributed Spatial Data Management Systems for Processing Distance Join Queries

机译:分布式空间数据管理系统处理距离联接查询的比较

获取原文
获取原文并翻译 | 示例

摘要

Due to the ubiquitous use of spatial data applications and the large amounts of spatial data that these applications generate, the processing of large-scale distance joins in distributed systems is becoming increasingly popular. Two of the most studied distance join queries are the K Closest Pair Query (KCPQ) and the e Distance Join Query (εDJQ). The KCPQ finds the K closest pairs of points from two datasets and the εDJQ finds all the possible pairs of points from two datasets, that are within a distance threshold e of each other. Distributed cluster-based computing systems can be classified in Hadoop-based and Spark-based systems. Based on this classification, in this paper, we compare two of the most current and leading distributed spatial data management systems, namely SpatialHadoop and LocationSpark, by evaluating the performance of existing and newly proposed parallel and distributed distance join query algorithms in different situations with big real-world datasets. As a general conclusion, while SpatialHadoop is more mature and robust system, LocationSpark is the winner with respect to the total execution time.
机译:由于空间数据应用程序的普遍使用以及这些应用程序生成的大量空间数据,在分布式系统中进行大规模距离联接的处理变得越来越普遍。研究最多的两个距离联接查询是K最近配对查询(KCPQ)和e距离联接查询(εDJQ)。 KCPQ从两个数据集中找到K个最接近的点对,而εDJQ从两个数据集中找到彼此之间的距离阈值e以内的所有可能的点对。分布式基于群集的计算系统可以分为基于Hadoop和基于Spark的系统。在此分类的基础上,本文通过评估现有和新提出的并行和分布式距离联接查询算法在不同情况下的性能,比较了两个最先进且领先的分布式空间数据管理系统SpatialHadoop和LocationSpark。真实数据集。总的来说,尽管SpatialHadoop是一个更成熟,更强大的系统,但就总执行时间而言,LocationSpark是赢家。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号