Performance evaluation of distributed computing environments with Hadoop and Spark frameworks

机译：具有Hadoop和Spark框架的分布式计算环境的性能评估

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recently, due to rapid development of information and communication technologies, the data are created and consumed in the avalanche way. Distributed computing create preconditions for analyzing and processing such Big Data by distributing the computations among a number of compute nodes. In this work, performance of distributed computing environments on the basis of Hadoop and Spark frameworks is estimated for real and virtual versions of clusters. As a test task, we chose the classic use case of word counting in texts of various sizes. It was found that the running times grow very fast with the dataset size and faster than a power function even. As to the real and virtual versions of cluster implementations, this tendency is the similar for both Hadoop and Spark frameworks. Moreover, speedup values decrease significantly with the growth of dataset size, especially for virtual version of cluster configuration. The problem of growing data generated by IoT and multimodal (visual, sound, tactile, neuro and brain-computing, muscle and eye tracking, etc.) interaction channels is presented. In the context of this problem, the current observations as to the running times and speedup on Hadoop and Spark frameworks in real and virtual cluster configurations can be very useful for the proper scaling-up and efficient job management, especially for machine learning and Deep Learning applications, where Big Data are widely present.

机译：最近，由于信息和通信技术的快速发展，数据是以雪崩方式创建和消耗的。分布式计算通过在许多计算节点之间分配计算来创建用于分析和处理此类大数据的前提条件。在这项工作中，估计基于Hadoop和Spark框架的分布式计算环境的性能对于群集的真实和虚拟版本。作为测试任务，我们选择了各种尺寸文本中的单词数的经典用例。发现运行时间与数据集大小快速增长，甚至比功率函数更快。对于群集实现的实际和虚拟版本，这种趋势是Hadoop和Spark框架的类似。此外，加速值随着数据集大小的增长而显着减少，尤其是对于群集配置的虚拟版本。提出了由物联网和多模式（视觉，声音，触觉，神经和脑计算，肌肉和眼睛跟踪等）产生的越来越多的数据的问题。在这个问题的上下文中，目前对Hadoop和Spark Frameworks的运行时间和加速的当前观察对于实际和虚拟群集配置中的运行和激发框架可以非常有用，对于正确的缩放和高效的工作管理非常有用，特别是对于机器学习和深度学习广泛存在大数据的应用程序。

著录项

来源
《IEEE International Young Scientists Forum on Applied Physics and Engineering》|2017年|374p|共4页
会议地点
作者
Vladyslav Taran; Oleg Alienin; Sergii Stirenko; Yuri Gordienko; A. Rojbi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 O59-53;
关键词
Sparks; Big Data; Servers; Distributed databases; Standards;

机译：火花;大数据;服务器;分布式数据库;标准;

相似文献

外文文献
中文文献
专利

1. A Hierarchical Hadoop Framework to Handle Big Data in Geo-Distributed Computing Environments [J] . Orazio Tomarchio, Giuseppe Di Modica, Marco Cavallo, International journal of information technologies and systems approach . 2018,第1期

机译：在地理分布式计算环境中处理大数据的分层Hadoop框架
2. A COMPARISON BETWEEN THE HADOOP AND SPARK DISTRIBUTED FRAMEWORKS IN THE CONTEXT OF REGION-GROWING SEGMENTATION OF REMOTE SENSING IMAGES [J] . R. B. Andrade, J. M. F. Santos, G. A. O. P. Costa, ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences . 2019,第5期

机译：在遥感图像的地区生长分割的背景下Hadoop和Spark分布式框架的比较
3. Typhoon quantitative rainfall prediction from big data analytics by using the apache hadoop spark parallel computing framework [J] . C- C. Wei, T.- H. Chou Oceanographic Literature Review . 2020,第10期

机译：台风通过使用Apache Hadoop火花并行计算框架来从大数据分析的量化降雨预测
4. Performance evaluation of distributed computing environments with Hadoop and Spark frameworks [C] . Vladyslav Taran, Oleg Alienin, Sergii Stirenko, International Young Scientists Forum on Applied Physics and Engineering . 2017

机译：使用Hadoop和Spark框架进行分布式计算环境的性能评估
5. Transition to Distributed Computing: A Framework for Evaluating Distributed Object Models [D] . Natarajan, Jeyabarathi. 2000

机译：向分布式计算的过渡：评估分布式对象模型的框架
6. Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework [O] . Steven Lewis, Attila Csordas, Sarah Killcoyne, 2012

机译：Hydra：可扩展的蛋白质组搜索引擎利用Hadoop分布式计算框架
7. Performance Evaluation of Distributed Computing Environments with Hadoop and Spark Frameworks [O] . Taran, Vladyslav, Alienin, Oleg, Stirenko, Sergii, 2017

机译：基于Hadoop的分布式计算环境性能评估和spark框架

Performance evaluation of distributed computing environments with Hadoop and Spark frameworks

摘要

著录项

相似文献

相关主题

期刊订阅