首页> 外文会议>International Conference on Information and Communication Technology Convergence >Performance Study of Distributed Big Data Analysis in YARN Cluster
【24h】

Performance Study of Distributed Big Data Analysis in YARN Cluster

机译:纱线分布式大数据分析的绩效研究

获取原文

摘要

In the 4-th Industrial Revolution era, various intelligent solutions and services have been emerging recently. To provide high quality service in those intelligent applications, the big data should be collected without any loss and comprehensively analyzed. Especially, when using machine and deep learning techniques, the big data processing delays should be minimized in order to guarantee the freshness of models. In this paper, we evaluate the performance of Apache Spark which is one of the most popular big data processing and analysis frameworks. Beyond the performance analysis of Spark in distributed cluster environment, we evaluate the performance of TensorFlowOnSpark which is the promising distributed deep learning framework designed to handle big data efficiently. From the experimental results, we can conclude that Spark on YARN is a solid underlying framework that guarantees the performance and scalability of distributed machine and deep learning by efficiently processing its data and algorithms in a parallel and distributed manner.
机译:在第四届工业革命时代,最近出现了各种智能解决方案和服务。为了在这些智能应用中提供高质量的服务,应收集大数据而不会损失和全面分析。特别是,当使用机器和深度学习技术时,应最小化大数据处理延迟,以保证模型的新鲜度。在本文中,我们评估了Apache Spark的性能,这是最受欢迎的大数据处理和分析框架之一。除了分布式集群环境中火花的性能分析,我们评估了TensorFlowonspark的性能,这是有前途的分布式深度学习框架,旨在有效处理大数据。从实验结果来看,我们可以得出结论,纱线上的火花是一个坚实的底层框架,可保证分布式机器和深度学习的性能和可扩展性,通过以平行和分布的方式有效地处理其数据和算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号