首页> 外文会议>International Conference on Informatics and Computing >Performance Analysis of Big Data Frameworks on Virtualized Clusters
【24h】

Performance Analysis of Big Data Frameworks on Virtualized Clusters

机译:虚拟集群上大数据框架的性能分析

获取原文

摘要

Research on Big Data applications has become increasingly important for institutions and researchers worldwide. This trend is triggered by the increasingly use of systems and devices that leads to generate massive of electronic data each day. The implementation of conventional algorithms has been considered to be less efficient on managing and processing large datasets. In Big Data computation, Hadoop and Apache Spark are two open source frameworks that are commonly used and run on physical clusters. Since running these frameworks on a physical cluster costs more energy and rigid in management, in this research we evaluated their performance on virtualized clusters. Virtualization technology offers flexibility on managing cluster by sharing the resources to multiple instances. Our experiments show that in general Apache Spark is about 2-9 times better in execution time and throughput compared with Hadoop running on a virtualized environment.
机译:对于全球机构和研究人员而言,大数据应用程序的研究变得越来越重要。这种趋势是由越来越多地使用导致每天生成大量电子数据的系统和设备触发的。常规算法的实现被认为在管理和处理大型数据集方面效率较低。在大数据计算中,Hadoop和Apache Spark是两个常用的开源框架,它们在物理集群上运行。由于在物理集群上运行这些框架会花费更多的精力和严格的管理,因此在本研究中,我们评估了它们在虚拟集群上的性能。虚拟化技术通过将资源共享到多个实例,在管理集群方面提供了灵活性。我们的实验表明,与在虚拟化环境中运行的Hadoop相比,Apache Spark在执行时间和吞吐量方面通常要高出约2到9倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号