Performance Study of Distributed Big Data Analysis in YARN Cluster

机译：YARN集群中分布式大数据分析的性能研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the 4-th Industrial Revolution era, various intelligent solutions and services have been emerging recently. To provide high quality service in those intelligent applications, the big data should be collected without any loss and comprehensively analyzed. Especially, when using machine and deep learning techniques, the big data processing delays should be minimized in order to guarantee the freshness of models. In this paper, we evaluate the performance of Apache Spark which is one of the most popular big data processing and analysis frameworks. Beyond the performance analysis of Spark in distributed cluster environment, we evaluate the performance of TensorFlowOnSpark which is the promising distributed deep learning framework designed to handle big data efficiently. From the experimental results, we can conclude that Spark on YARN is a solid underlying framework that guarantees the performance and scalability of distributed machine and deep learning by efficiently processing its data and algorithms in a parallel and distributed manner.

机译：在第四次工业革命时代，最近出现了各种智能解决方案和服务。为了在那些智能应用中提供高质量的服务，应无损失地收集大数据并进行全面分析。特别是，在使用机器和深度学习技术时，应将大数据处理延迟最小化，以确保模型的新鲜度。在本文中，我们评估了Apache Spark的性能，Apache Spark是最流行的大数据处理和分析框架之一。除了在分布式集群环境中对Spark进行性能分析之外，我们还评估TensorFlowOnSpark的性能，TensorFlowOnSpark是有前途的分布式深度学习框架，旨在有效地处理大数据。从实验结果可以得出结论，Spark on YARN是一个坚实的基础框架，可通过以并行和分布式方式高效处理其数据和算法来保证分布式机器和深度学习的性能和可伸缩性。

著录项

来源
《International Conference on Information and Communication Technology Convergence》|2018年|1261-1266|共6页
会议地点
作者
Hoo Young Ahn; Hyunjae Kim; WoongShik You;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Big Data; Sparks; Yarn; Training; Data models; Computational modeling;

机译：大数据;火花;纱线;培训;数据模型;计算模型;
入库时间 2022-08-26 15:25:17

相似文献

外文文献
中文文献
专利

1. Performance Analysis of Data Processing Using High Performance Distributed Computer Clusters [J] . R. Kannadasan, K. P. Rajasekaran, S. Jaganath, Journal of computational and theoretical nanoscience . 2019,第5a6期

机译：高性能分布式计算机集群数据处理性能分析
2. Towards distributed acceleration of image processing applications using reconfigurable active SSD clusters: a case study of seismic data analysis [J] . Mageda Sharafeddin, Hmayag Partamian, Mariette Awad, International Journal of High Performance Computing and Networking . 2018,第4期

机译：使用可重新配置的Active SSD集群向图像处理应用的分布加速度：地震数据分析的案例研究
3. Distributed Regression Analysis Application in Large Distributed Data Networks: Analysis of Precision and Operational Performance [J] . Qoua Her, Jessica Malenfant, Zilu Zhang, JMIR Medical Informatics . 2020,第6期

机译：大型分布式数据网络中的分布式回归分析应用：精度和操作性能分析
4. Performance Study of Distributed Big Data Analysis in YARN Cluster [C] . Hoo Young Ahn, Hyunjae Kim, WoongShik You International Conference on Information and Communication Technology Convergence . 2018

机译：纱线分布式大数据分析的绩效研究
5. Innovation and financial performance: A study of the effects of patent data from emerging cluster analysis [D] . Lepore, Lisa D. 2016

机译：创新与财务绩效：新兴聚类分析对专利数据影响的研究
6. Performance Analysis of Distributed Estimation for Data Fusion Using a Statistical Approach in Smart Grid Noisy Wireless Sensor Networks [O] . Chatura Seneviratne, Patikiri Arachchige Don Shehan Nilmantha Wijesekara, Henry Leung 2020

机译：智能电网噪声无线传感器网络中基于统计方法的数据融合分布式估计性能分析
7. DDHCS: Distributed Denial-of-service Threat to YARN Clusters based on Health Check Service [O] . Wenting Li, Qingni Shen, Chuntao Dong, 2016

机译：DDHCS：基于健康检查服务对纱线集群分发拒绝服务威胁

Performance Study of Distributed Big Data Analysis in YARN Cluster

摘要

著录项

相似文献

相关主题

期刊订阅