...
首页> 外文期刊>International journal of parallel programming >Real-Time Big Data Stream Processing Using GPU with Spark Over Hadoop Ecosystem
【24h】

Real-Time Big Data Stream Processing Using GPU with Spark Over Hadoop Ecosystem

机译:使用GPU和Hadoop生态系统上的Spark进行实时大数据流处理

获取原文
获取原文并翻译 | 示例
           

摘要

In this technological era, every person, authorities, entrepreneurs, businesses, and many things around us are connected to the internet, forming Internet of thing (IoT). This generates a massive amount of diverse data with very high-speed, termed as big data. However, this data is very useful that can be used as an asset for the businesses, organizations, and authorities to predict future in various aspects. However, efficiently processing Big Data while making real-time decisions is a quite challenging task. Some of the tools like Hadoop are used for Big Datasets processing. On the other hand, these tools could not perform well in the case of real-time high-speed stream processing. Therefore, in this paper, we proposed an efficient and real-time Big Data stream processing approach while mapping Hadoop MapReduce equivalent mechanism on graphics processing units (GPUs). We integrated a parallel and distributed environment of Hadoop ecosystem and a real-time streaming processing tool, i.e., Spark with GPU to make the system more powerful in order to handle the overwhelming amount of high-speed streaming. We designed a MapReduce equivalent algorithm for GPUs for a statistical parameter calculation by dividing overall Big Data files into fixed-size blocks. Finally, the system is evaluated while considering the efficiency aspect (processing time and throughput) using (1) large-size city traffic video data captured by static as well as moving vehicles’ cameras while identifying vehicles and (2) large text-based files, like twitter data files, structural data, etc. Results show that the proposed system working with Spark on top and GPUs under the parallel and distributed environment of Hadoop ecosystem is more efficient and real-time as compared to existing standalone CPU-based MapReduce implementation.
机译:在这个技术时代,每个人,当局,企业家,企业以及我们周围的许多事物都连接到Internet,形成了物联网(IoT)。这会以非常高的速度生成大量的各种数据,称为大数据。但是,此数据非常有用,可以用作企业,组织和机构在各个方面预测未来的资产。但是,在做出实时决策的同时有效地处理大数据是一项非常具有挑战性的任务。诸如Hadoop之类的某些工具用于大数据集处理。另一方面,这些工具在实时高速流处理的情况下不能很好地执行。因此,在本文中,我们提出了一种高效且实时的大数据流处理方法,同时在图形处理单元(GPU)上映射了Hadoop MapReduce等效机制。我们将Hadoop生态系统的并行和分布式环境与实时流处理工具(即Spark和GPU)集成在一起,以使系统更强大,以处理大量的高速流。我们通过将整体大数据文件划分为固定大小的块,为GPU设计了MapReduce等效算法进行统计参数计算。最后,评估系统时要考虑效率方面(处理时间和吞吐量),方法是:(1)静态和移动车辆的摄像机捕获的大型城市交通视频数据,同时识别车辆;(2)大型基于文本的文件结果显示,与现有的基于CPU的独立MapReduce实现相比,该建议的系统在Hadoop生态系统的并行和分布式环境下在顶部使用Spark和GPU进行工作时,建议的系统更加高效,实时。 。

著录项

  • 来源
    《International journal of parallel programming》 |2018年第3期|630-646|共17页
  • 作者单位

    School of Computer Science and Engineering, Kyungpook National University;

    School of Computer Science and Engineering, Kyungpook National University;

    Department of Information and Communication Engineering, Yeungnam University;

    School of Computer Science and Engineering, Kyungpook National University;

    Department of Embedded Systems Engineering, Incheon National University;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Big Data; Hadoop; Spark; GPU; MapReduce;

    机译:大数据;Hadoop;火花;GPU;MapReduce;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号