首页> 外文会议>INNS Conference on Big Data >Big Data Analytics in the Cloud: Spark on Hadoop vs MPI/OpenMP on Beowulf
【24h】

Big Data Analytics in the Cloud: Spark on Hadoop vs MPI/OpenMP on Beowulf

机译:云中的大数据分析:海鲜上的火花在Beowulf上的Hadoop VS / OpenMP

获取原文

摘要

One of the biggest challenges of the current big data landscape is our inability to process vast amounts of information in a reasonable time. In this work, we explore and compare two distributed computing frameworks implemented on commodity cluster architectures: MPI/OpenMP on Beowulf that is high-performance oriented and exploits multi-machine/multicore infrastructures, and Apache Spark on Hadoop which targets iterative algorithms through in-memory computing. We use the Google Cloud Platform service to create virtual machine clusters, run the frameworks, and evaluate two supervised machine learning algorithms: KNN and Pegasos SVM. Results obtained from experiments with a particle physics data set show MPI/OpenMP outperforms Spark by more than one order of magnitude in terms of processing speed and provides more consistent performance. However, Spark shows better data management infrastructure and the possibility of dealing with other aspects such as node failure and data replication.
机译:目前大数据景观的最大挑战之一是我们无法在合理的时间内处理大量信息。在这项工作中,我们探索并比较了在商品集群架构上实现的两个分布式计算框架:蜜蜂的MPI / OpenMP是高性能导向和利用多机/多电机基础架构,以及Hadoop上的Apache Spark,它通过In-实现迭代算法记忆计算。我们使用Google Cloud Platform Service创建虚拟机群集,运行框架,并评估两个监督机器学习算法:KNN和PEGASOS SVM。从粒子物理数据集的实验获得的结果显示MPI / OpenMP在处理速度方面通过一个以上的数量级,并提供更一致的性能。然而,Spark显示了更好的数据管理基础架构和处理其他方面,例如节点故障和数据复制。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号