Performance Study of Distributed Big Data Analysis in YARN Cluster

机译：纱线分布式大数据分析的绩效研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the 4-th Industrial Revolution era, various intelligent solutions and services have been emerging recently. To provide high quality service in those intelligent applications, the big data should be collected without any loss and comprehensively analyzed. Especially, when using machine and deep learning techniques, the big data processing delays should be minimized in order to guarantee the freshness of models. In this paper, we evaluate the performance of Apache Spark which is one of the most popular big data processing and analysis frameworks. Beyond the performance analysis of Spark in distributed cluster environment, we evaluate the performance of TensorFlowOnSpark which is the promising distributed deep learning framework designed to handle big data efficiently. From the experimental results, we can conclude that Spark on YARN is a solid underlying framework that guarantees the performance and scalability of distributed machine and deep learning by efficiently processing its data and algorithms in a parallel and distributed manner.

机译：在第四届工业革命时代，最近出现了各种智能解决方案和服务。为了在这些智能应用中提供高质量的服务，应收集大数据而不会损失和全面分析。特别是，当使用机器和深度学习技术时，应最小化大数据处理延迟，以保证模型的新鲜度。在本文中，我们评估了Apache Spark的性能，这是最受欢迎的大数据处理和分析框架之一。除了分布式集群环境中火花的性能分析，我们评估了TensorFlowonspark的性能，这是有前途的分布式深度学习框架，旨在有效处理大数据。从实验结果来看，我们可以得出结论，纱线上的火花是一个坚实的底层框架，可保证分布式机器和深度学习的性能和可扩展性，通过以平行和分布的方式有效地处理其数据和算法。

著录项

来源
《International Conference on Information and Communication Technology Convergence》|2018年|747 p. :|共6页
会议地点
作者
Hoo Young Ahn; Hyunjae Kim; WoongShik You;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN911.2-532;
关键词
Big Data; Sparks; Yarn; Training; Data models; Computational modeling;

机译：大数据;火花;纱线;训练;数据模型;计算建模;

相似文献

外文文献
中文文献
专利

1. Performance Analysis of Data Processing Using High Performance Distributed Computer Clusters [J] . R. Kannadasan, K. P. Rajasekaran, S. Jaganath, Journal of computational and theoretical nanoscience . 2019,第5a6期

机译：高性能分布式计算机集群数据处理性能分析
2. Towards distributed acceleration of image processing applications using reconfigurable active SSD clusters: a case study of seismic data analysis [J] . Mageda Sharafeddin, Hmayag Partamian, Mariette Awad, International Journal of High Performance Computing and Networking . 2018,第4期

机译：使用可重新配置的Active SSD集群向图像处理应用的分布加速度：地震数据分析的案例研究
3. Distributed Regression Analysis Application in Large Distributed Data Networks: Analysis of Precision and Operational Performance [J] . Qoua Her, Jessica Malenfant, Zilu Zhang, JMIR Medical Informatics . 2020,第6期

机译：大型分布式数据网络中的分布式回归分析应用：精度和操作性能分析
4. Performance Study of Distributed Big Data Analysis in YARN Cluster [C] . Hoo Young Ahn, Hyunjae Kim, WoongShik You International Conference on Information and Communication Technology Convergence . 2018

机译：YARN集群中分布式大数据分析的性能研究
5. Innovation and financial performance: A study of the effects of patent data from emerging cluster analysis [D] . Lepore, Lisa D. 2016

机译：创新与财务绩效：新兴聚类分析对专利数据影响的研究
6. Performance Analysis of Distributed Estimation for Data Fusion Using a Statistical Approach in Smart Grid Noisy Wireless Sensor Networks [O] . Chatura Seneviratne, Patikiri Arachchige Don Shehan Nilmantha Wijesekara, Henry Leung 2020

机译：智能电网噪声无线传感器网络中基于统计方法的数据融合分布式估计性能分析
7. DDHCS: Distributed Denial-of-service Threat to YARN Clusters based on Health Check Service [O] . Wenting Li, Qingni Shen, Chuntao Dong, 2016

机译：DDHCS：基于健康检查服务对纱线集群分发拒绝服务威胁

Performance Study of Distributed Big Data Analysis in YARN Cluster

摘要

著录项

相似文献

相关主题

期刊订阅