首页> 外文会议>IEEE International Young Scientists Forum on Applied Physics and Engineering >Performance evaluation of distributed computing environments with Hadoop and Spark frameworks
【24h】

Performance evaluation of distributed computing environments with Hadoop and Spark frameworks

机译:具有Hadoop和Spark框架的分布式计算环境的性能评估

获取原文

摘要

Recently, due to rapid development of information and communication technologies, the data are created and consumed in the avalanche way. Distributed computing create preconditions for analyzing and processing such Big Data by distributing the computations among a number of compute nodes. In this work, performance of distributed computing environments on the basis of Hadoop and Spark frameworks is estimated for real and virtual versions of clusters. As a test task, we chose the classic use case of word counting in texts of various sizes. It was found that the running times grow very fast with the dataset size and faster than a power function even. As to the real and virtual versions of cluster implementations, this tendency is the similar for both Hadoop and Spark frameworks. Moreover, speedup values decrease significantly with the growth of dataset size, especially for virtual version of cluster configuration. The problem of growing data generated by IoT and multimodal (visual, sound, tactile, neuro and brain-computing, muscle and eye tracking, etc.) interaction channels is presented. In the context of this problem, the current observations as to the running times and speedup on Hadoop and Spark frameworks in real and virtual cluster configurations can be very useful for the proper scaling-up and efficient job management, especially for machine learning and Deep Learning applications, where Big Data are widely present.
机译:最近,由于信息和通信技术的快速发展,数据是以雪崩方式创建和消耗的。分布式计算通过在许多计算节点之间分配计算来创建用于分析和处理此类大数据的前提条件。在这项工作中,估计基于Hadoop和Spark框架的分布式计算环境的性能对于群集的真实和虚拟版本。作为测试任务,我们选择了各种尺寸文本中的单词数的经典用例。发现运行时间与数据集大小快速增长,甚至比功率函数更快。对于群集实现的实际和虚拟版本,这种趋势是Hadoop和Spark框架的类似。此外,加速值随着数据集大小的增长而显着减少,尤其是对于群集配置的虚拟版本。提出了由物联网和多模式(视觉,声音,触觉,神经和脑计算,肌肉和眼睛跟踪等)产生的越来越多的数据的问题。在这个问题的上下文中,目前对Hadoop和Spark Frameworks的运行时间和加速的当前观察对于实际和虚拟群集配置中的运行和激发框架可以非常有用,对于正确的缩放和高效的工作管理非常有用,特别是对于机器学习和深度学习广泛存在大数据的应用程序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号