A framework for parallel map-matching at scale using Spark

Peixoto Douglas Alves; Hung Quoc Viet Nguyen; Zheng Bolong; Zhou Xiaofang

首页> 外文期刊>Distributed and Parallel Databases >A framework for parallel map-matching at scale using Spark

【24h】

A framework for parallel map-matching at scale using Spark

机译：使用Spark进行大规模并行地图匹配的框架

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Map-matching is a problem of matching recorded GPS trajectories to a digital representation of the road network. GPS data may be inaccurate and heterogeneous, due to limitations or error on electronic sensors, as well as law restrictions. How to accurately match trajectories to the road map is an important preprocessing step for many real-world applications, such as trajectory data mining, traffic analysis, and routes prediction. However, the high availability of GPS trajectories and map data challenges the scalability of current map-matching algorithms, which are limited for small datasets since they focus only on the accuracy of the matching rather than scalability. Therefore, we propose a distributed parallel framework for efficient and scalable offline map-matching on top of the Spark framework. Spark uses distributed in-memory data storage and the MapReduce paradigm to achieve horizontal scaling and fast computation of large datasets. Spark, however, is still limited for dynamic map-matching, and memory consumption in Spark can be an issue for very large datasets. We develop a framework to allow map-matching on top os Spark, while achieving horizontal scalability, memory-wise usage, and maintaining the accuracy of state-of-the-art matching algorithms by: (1) We combine a sampling-based Quadtree spatial partitioning construction and batch-based computation to achieve horizontal scalability of map-matching, as well as reduce cluster memory usage. (2) We employ a safe spatial-boundary approach to preserve matching accuracy of boundary objects. (3) In addition, a cost function for the distributed map-matching workload is provided in order to tune the framework parameters. Our extensive experiments demonstrate that our framework is efficient and scalable to process map-matching on large-scale data, while keeping matching accuracy and low memory usage.

机译：地图匹配是将记录的GPS轨迹与道路网络的数字表示进行匹配的问题。由于电子传感器的限制或错误以及法律限制，GPS数据可能不准确且种类繁多。对于许多实际应用，例如轨迹数据挖掘，交通分析和路线预测，如何精确地将轨迹与路线图匹配是重要的预处理步骤。但是，GPS轨迹和地图数据的高可用性挑战了当前地图匹配算法的可扩展性，该方法仅适用于小型数据集，因为它们仅关注匹配的准确性而不是可扩展性，因此仅限于小型数据集。因此，我们提出了一个分布式并行框架，用于在Spark框架之上进行高效且可扩展的离线地图匹配。 Spark使用分布式内存数据存储和MapReduce范例来实现水平缩放和大型数据集的快速计算。但是，Spark对于动态地图匹配仍然受到限制，对于非常大的数据集，Spark中的内存消耗可能是一个问题。我们开发了一个框架，以允许在顶级os Spark上进行地图匹配，同时实现水平可伸缩性，按内存使用，并通过以下方式保持最新匹配算法的准确性：（1）我们结合了基于采样的Quadtree空间分区构造和基于批处理的计算可实现地图匹配的水平可伸缩性，并减少群集内存的使用。（2）我们采用安全的空间边界方法来保持边界对象的匹配精度。（3）此外，还提供了用于分布式地图匹配工作量的成本函数，以调整框架参数。我们广泛的实验表明，我们的框架高效且可扩展，可在处理大规模数据时进行地图匹配，同时保持匹配精度和低内存使用率。

著录项

来源
《Distributed and Parallel Databases》 |2019年第4期|697-720|共24页
作者
Peixoto Douglas Alves; Hung Quoc Viet Nguyen; Zheng Bolong; Zhou Xiaofang;
展开▼
作者单位

Univ Queensland Sch Informat Technol & Elect Engn Brisbane Qld Australia;

Griffith Univ Sch Informat & Commun Technol Gold Coast Australia;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Map-matching; Spark; Trajectory; Efficiency; Scalability;

机译：地图匹配;火花;弹道;效率;可扩展性;

相似文献

外文文献
中文文献
专利

1. Typhoon quantitative rainfall prediction from big data analytics by using the apache hadoop spark parallel computing framework [J] . C- C. Wei, T.- H. Chou Oceanographic Literature Review . 2020,第10期

机译：台风通过使用Apache Hadoop火花并行计算框架来从大数据分析的量化降雨预测
2. An Anomaly Data Mining Method for Mass Sensor Networks Using Improved PSO Algorithm Based on Spark Parallel Framework [J] . Journal of grid computing . 2020,第2期

机译：基于火花并行框架的改进PSO算法的质量传感器网络的异常数据挖掘方法
3. Algorithm and Implementation of Distributed ESN Using Spark Framework and Parallel PSO [J] . Kehe Wu, Yayun Zhu, Quan Li, Applied Sciences . 2017,第4期

机译：使用Spark框架和并行PSO的分布式ESN的算法和实现
4. Parallel clustering method for non-disjoint partitioning of large-scale data based on spark framework [C] . Abir Zayani, Chiheb-Eddine Ben NCir, Nadia Essoussi IEEE International Congress on Big Data . 2016

机译：基于Spark框架的大数据非不相干分区并行聚类方法
5. A near real-time, highly scalable, parallel and distributed adaptive object detection and re-training framework based on the AdaBoost algorithm [D] . Abualkibash, Munther 2015

机译：基于AdaBoost算法的近实时，高度可扩展，并行和分布式的自适应对象检测和再训练框架
6. An adaptive spark-based framework for querying large-scale NoSQL and relational databases [O] . Eman Khashan, Ali Eldesouky, Sally Elghamrawy 2021

机译：用于查询大型NoSQL和关系数据库的自适应火花基框架
7. RDD-Eclat: Approaches to Parallelize Eclat Algorithm on Spark RDD Framework [O] . Pankaj Singh, Sudhakar Singh, P. K. Mishra, 2020

机译：RDD-Eclat：在Spark RDD框架上并行化Eclat算法的方法
8. Implementation and Scalability of a Pure Java Parallel Framework with Application to Hyperbolic Conservation Laws (Preprint); Journal article [R] . Kapper, M., Cambier, J. 2008

机译：应用于双曲线守恒定律的纯Java并行框架的实现和可扩展性（预印本）;杂志文章

A framework for parallel map-matching at scale using Spark

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅