首页> 外文会议>European Conference on Advances in Databases and Information Systems >A Comparison of Distributed Spatial Data Management Systems for Processing Distance Join Queries
【24h】

A Comparison of Distributed Spatial Data Management Systems for Processing Distance Join Queries

机译:用于处理距离连接查询的分布式空间数据管理系统的比较

获取原文

摘要

Due to the ubiquitous use of spatial data applications and the large amounts of spatial data that these applications generate, the processing of large-scale distance joins in distributed systems is becoming increasingly popular. Two of the most studied distance join queries are the K Closest Pair Query (KCPQ) and the ε Distance Join Query (εDJQ). The KCPQ finds the K closest pairs of points from two datasets and the εDJQ finds all the possible pairs of points from two datasets, that are within a distance threshold ε of each other. Distributed cluster-based computing systems can be classified in Hadoop-based and Spark-based systems. Based on this classification, in this paper, we compare two of the most current and leading distributed spatial data management systems, namely SpatialHadoop and LocationSpark, by evaluating the performance of existing and newly proposed parallel and distributed distance join query algorithms in different situations with big real-world datasets. As a general conclusion, while SpatialHadoop is more mature and robust system, LocationSpark is the winner with respect to the total execution time.
机译:由于空间数据应用的使用和这些应用产生的大量空间数据,分布式系统中的大规模距离连接的处理变得越来越受欢迎。最多研究的距离连接查询中的两个是k最近的对查询(kcpq)和ε距离连接查询(εdjq)。 KCPQ找到来自两个数据集的最接近点对,εdjq找到来自两个数据集的所有可能对的点,即在彼此的距离阈值ε内。分布式基于群集的计算系统可以在基于Hadoop和基于Spark的系统中进行分类。基于此分类,通过本文比较两个最新和领先的分布式空间数据管理系统,即SpatialHadoop和Locationspark,通过评估现有和新提出的并行距离连接查询算法的不同情况下的现有和新提出的并联距离连接查询算法。现实世界数据集。作为一般的结论,虽然SpatialHadoop是更成熟和强大的系统,所以位于总执行时间的胜利者是胜利者。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号