首页> 外文期刊>Distributed and Parallel Databases >A framework for parallel map-matching at scale using Spark
【24h】

A framework for parallel map-matching at scale using Spark

机译:使用Spark进行大规模并行地图匹配的框架

获取原文
获取原文并翻译 | 示例

摘要

Map-matching is a problem of matching recorded GPS trajectories to a digital representation of the road network. GPS data may be inaccurate and heterogeneous, due to limitations or error on electronic sensors, as well as law restrictions. How to accurately match trajectories to the road map is an important preprocessing step for many real-world applications, such as trajectory data mining, traffic analysis, and routes prediction. However, the high availability of GPS trajectories and map data challenges the scalability of current map-matching algorithms, which are limited for small datasets since they focus only on the accuracy of the matching rather than scalability. Therefore, we propose a distributed parallel framework for efficient and scalable offline map-matching on top of the Spark framework. Spark uses distributed in-memory data storage and the MapReduce paradigm to achieve horizontal scaling and fast computation of large datasets. Spark, however, is still limited for dynamic map-matching, and memory consumption in Spark can be an issue for very large datasets. We develop a framework to allow map-matching on top os Spark, while achieving horizontal scalability, memory-wise usage, and maintaining the accuracy of state-of-the-art matching algorithms by: (1) We combine a sampling-based Quadtree spatial partitioning construction and batch-based computation to achieve horizontal scalability of map-matching, as well as reduce cluster memory usage. (2) We employ a safe spatial-boundary approach to preserve matching accuracy of boundary objects. (3) In addition, a cost function for the distributed map-matching workload is provided in order to tune the framework parameters. Our extensive experiments demonstrate that our framework is efficient and scalable to process map-matching on large-scale data, while keeping matching accuracy and low memory usage.
机译:地图匹配是将记录的GPS轨迹与道路网络的数字表示进行匹配的问题。由于电子传感器的限制或错误以及法律限制,GPS数据可能不准确且种类繁多。对于许多实际应用,例如轨迹数据挖掘,交通分析和路线预测,如何精确地将轨迹与路线图匹配是重要的预处理步骤。但是,GPS轨迹和地图数据的高可用性挑战了当前地图匹配算法的可扩展性,该方法仅适用于小型数据集,因为它们仅关注匹配的准确性而不是可扩展性,因此仅限于小型数据集。因此,我们提出了一个分布式并行框架,用于在Spark框架之上进行高效且可扩展的离线地图匹配。 Spark使用分布式内存数据存储和MapReduce范例来实现水平缩放和大型数据集的快速计算。但是,Spark对于动态地图匹配仍然受到限制,对于非常大的数据集,Spark中的内存消耗可能是一个问题。我们开发了一个框架,以允许在顶级os Spark上进行地图匹配,同时实现水平可伸缩性,按内存使用,并通过以下方式保持最新匹配算法的准确性:(1)我们结合了基于采样的Quadtree空间分区构造和基于批处理的计算可实现地图匹配的水平可伸缩性,并减少群集内存的使用。 (2)我们采用安全的空间边界方法来保持边界对象的匹配精度。 (3)此外,还提供了用于分布式地图匹配工作量的成本函数,以调整框架参数。我们广泛的实验表明,我们的框架高效且可扩展,可在处理大规模数据时进行地图匹配,同时保持匹配精度和低内存使用率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号