首页> 外文会议>International Conference on Information and Communication Technology Convergence >Performance Study of Distributed Big Data Analysis in YARN Cluster
【24h】

Performance Study of Distributed Big Data Analysis in YARN Cluster

机译:YARN集群中分布式大数据分析的性能研究

获取原文

摘要

In the 4-th Industrial Revolution era, various intelligent solutions and services have been emerging recently. To provide high quality service in those intelligent applications, the big data should be collected without any loss and comprehensively analyzed. Especially, when using machine and deep learning techniques, the big data processing delays should be minimized in order to guarantee the freshness of models. In this paper, we evaluate the performance of Apache Spark which is one of the most popular big data processing and analysis frameworks. Beyond the performance analysis of Spark in distributed cluster environment, we evaluate the performance of TensorFlowOnSpark which is the promising distributed deep learning framework designed to handle big data efficiently. From the experimental results, we can conclude that Spark on YARN is a solid underlying framework that guarantees the performance and scalability of distributed machine and deep learning by efficiently processing its data and algorithms in a parallel and distributed manner.
机译:在第四次工业革命时代,最近出现了各种智能解决方案和服务。为了在那些智能应用中提供高质量的服务,应无损失地收集大数据并进行全面分析。特别是,在使用机器和深度学习技术时,应将大数据处理延迟最小化,以确保模型的新鲜度。在本文中,我们评估了Apache Spark的性能,Apache Spark是最流行的大数据处理和分析框架之一。除了在分布式集群环境中对Spark进行性能分析之外,我们还评估TensorFlowOnSpark的性能,TensorFlowOnSpark是有前途的分布式深度学习框架,旨在有效地处理大数据。从实验结果可以得出结论,Spark on YARN是一个坚实的基础框架,可通过以并行和分布式方式高效处理其数据和算法来保证分布式机器和深度学习的性能和可伸缩性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号