首页> 外文会议>INNS Conference on Big Data >Big Data Analytics in the Cloud: Spark on Hadoop vs MPI/OpenMP on Beowulf

【24h】

Big Data Analytics in the Cloud: Spark on Hadoop vs MPI/OpenMP on Beowulf

机译：云中的大数据分析：海鲜上的火花在Beowulf上的Hadoop VS / OpenMP

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

One of the biggest challenges of the current big data landscape is our inability to process vast amounts of information in a reasonable time. In this work, we explore and compare two distributed computing frameworks implemented on commodity cluster architectures: MPI/OpenMP on Beowulf that is high-performance oriented and exploits multi-machine/multicore infrastructures, and Apache Spark on Hadoop which targets iterative algorithms through in-memory computing. We use the Google Cloud Platform service to create virtual machine clusters, run the frameworks, and evaluate two supervised machine learning algorithms: KNN and Pegasos SVM. Results obtained from experiments with a particle physics data set show MPI/OpenMP outperforms Spark by more than one order of magnitude in terms of processing speed and provides more consistent performance. However, Spark shows better data management infrastructure and the possibility of dealing with other aspects such as node failure and data replication.

机译：目前大数据景观的最大挑战之一是我们无法在合理的时间内处理大量信息。在这项工作中，我们探索并比较了在商品集群架构上实现的两个分布式计算框架：蜜蜂的MPI / OpenMP是高性能导向和利用多机/多电机基础架构，以及Hadoop上的Apache Spark，它通过In-实现迭代算法记忆计算。我们使用Google Cloud Platform Service创建虚拟机群集，运行框架，并评估两个监督机器学习算法：KNN和PEGASOS SVM。从粒子物理数据集的实验获得的结果显示MPI / OpenMP在处理速度方面通过一个以上的数量级，并提供更一致的性能。然而，Spark显示了更好的数据管理基础架构和处理其他方面，例如节点故障和数据复制。

著录项

来源
《INNS Conference on Big Data》|2016年|485 p. :|共10页
会议地点
作者
Jorge L. Reyes-Ortiz; Luca Oneto; Davide Anguita;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.13-532;
关键词
Big Data; Supervised Learning; Spark; Hadoop; MPI; OpenMP; Beowulf; Cloud; Parallel Computing;

机译：大数据;监督学习;火花;Hadoop;MPI;Openmp;Beowulf;云;并行计算;

相似文献

外文文献
中文文献
专利

1. Big Data Analytics in the Cloud: Spark on Hadoop vs MPI/OpenMP on Beowulf [J] . Jorge L. Reyes-Ortiz, Luca Oneto, Davide Anguita Procedia Computer Science . 2015,第1期

机译：云中的大数据分析：Hadoop上的Spark与Beowulf上的MPI / OpenMP
2. A Research on Big Data Analytics Security and Privacy in Cloud, Data Mining, Hadoop and Mapreduce [J] . Nandhini.P International Journal of Engineering Research and Applications . 2018,第4期

机译：云，数据挖掘，Hadoop和Mapreduce中大数据分析安全性和隐私性的研究
3. Typhoon quantitative rainfall prediction from big data analytics by using the apache hadoop spark parallel computing framework [J] . C- C. Wei, T.- H. Chou Oceanographic Literature Review . 2020,第10期

机译：台风通过使用Apache Hadoop火花并行计算框架来从大数据分析的量化降雨预测
4. Big Data Analytics in the Cloud: Spark on Hadoop vs MPI/OpenMP on Beowulf [C] . Jorge L. Reyes-Ortiz, Luca Oneto, Davide Anguita INNS Conference on Big Data . 2016

机译：云中的大数据分析：海鲜上的火花在Beowulf上的Hadoop VS / OpenMP
5. Performance analysis of pure MPI versus MPI+OpenMP for Jacobi Iteration and a three-dimensional FFT on the Cray XT5. [D] . Weiss, Olga. 2012

机译：纯CPI与MPI + OpenMP进行Jacobi迭代和在Cray XT5上进行三维FFT的性能分析。
6. High Performance Data Clustering: A Comparative Analysis of Performance for GPU RASC MPI and OpenMP Implementations [O] . Luobin Yang, Steve C. Chiu, Wei-Keng Liao, -1

机译：高性能数据集群：GPURASCMPI和OpenMP实现的性能比较分析
7. Big data analytics in the cloud: Spark on Hadoop vs MPI/OpenMP on Beowulf [O] . Reyes-Ortiz, Jorge L., Oneto, Luca, Anguita, Davide 2015

机译：云中的大数据分析：Hadoop上的Spark与Beowulf上的MPI / OpenMP

Big Data Analytics in the Cloud: Spark on Hadoop vs MPI/OpenMP on Beowulf

摘要

著录项

相似文献

相关主题

期刊订阅