首页> 外文期刊>Complexity >A MapReduce-Based Parallel Frequent Pattern Growth Algorithm for Spatiotemporal Association Analysis of Mobile Trajectory Big Data
【24h】

A MapReduce-Based Parallel Frequent Pattern Growth Algorithm for Spatiotemporal Association Analysis of Mobile Trajectory Big Data

机译:一种基于MAPRIBUCE的移动轨迹大数据的时空关联分析的平行频繁模式生长算法

获取原文
获取原文并翻译 | 示例
           

摘要

Frequent pattern mining is an effective approach for spatiotemporal association analysis of mobile trajectory big data in data-driven intelligent transportation systems. While existing parallel algorithms have been successfully applied to frequent pattern mining of large-scale trajectory data, two major challenges are how to overcome the inherent defects of Hadoop to cope with taxi trajectory big data including massive small files and how to discover the implicitly spatiotemporal frequent patterns with MapReduce. To conquer these challenges, this paper presents a MapReduce-based Parallel Frequent Pattern growth (MR-PFP) algorithm to analyze the spatiotemporal characteristics of taxi operating using large-scale taxi trajectories with massive small file processing strategies on a Hadoop platform. More specifically, we first implement three methods, that is, Hadoop Archives (HAR), CombineFileInputFormat (CFIF), and Sequence Files (SF), to overcome the existing defects of Hadoop and then propose two strategies based on their performance evaluations. Next, we incorporate SF into Frequent Pattern growth (FP-growth) algorithm and then implement the optimized FP-growth algorithm on a MapReduce framework. Finally, we analyze the characteristics of taxi operating in both spatial and temporal dimensions byMR-PFP in parallel. The results demonstrate that MR-PFP is superior to existing Parallel FP-growth (PFP) algorithm in efficiency and scalability.
机译:频繁的模式挖掘是数据驱动智能运输系统中移动轨迹大数据的时空分析的有效方法。虽然现有的并行算法已经成功应用于大规模轨迹数据的频繁模式,但是两个主要挑战是如何克服Hadoop的固有缺陷,以应对出租车轨迹大数据,包括大规模的小文件,以及如何发现隐含的时空频繁与mapreduce的图案。为了征服这些挑战,本文提出了一种基于Mapreduce的平行频繁模式生长(MR-PFP)算法,分析了使用大规模出租车轨迹运行的出租车的时空特征,在Hadoop平台上具有大规模的小文件处理策略。更具体地说,我们首先实现三种方法,即Hadoop档案(Har),CombineFileInputFormat(CFIF)和序列文件(SF),以克服Hadoop的现有缺陷,然后根据其性能评估提出两种策略。接下来,我们将SF融入频繁的模式生长(FP-Grower)算法,然后在MapReduce框架上实现优化的FP-Grangic算法。最后,我们并行分析了在空间和时间尺寸的出租车的特性并行。结果表明,MR-PFP优于现有的平行FP-生长(PFP)算法,以效率和可扩展性。

著录项

  • 来源
    《Complexity》 |2018年第2期|共16页
  • 作者单位

    Guizhou Minzu Univ Coll Data Sci &

    Informat Engn Guiyang 550025 Guizhou Peoples R China;

    Guizhou Minzu Univ Coll Data Sci &

    Informat Engn Guiyang 550025 Guizhou Peoples R China;

    Southwest Univ Coll Elect &

    Informat Engn Chongqing 400715 Peoples R China;

    Southwest Univ Coll Comp &

    Informat Sci Chongqing 400715 Peoples R China;

    Southwest Univ Coll Comp &

    Informat Sci Chongqing 400715 Peoples R China;

    Southwest Univ Coll Comp &

    Informat Sci Chongqing 400715 Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 大系统理论;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号