Performance Analysis of Big Data Frameworks on Virtualized Clusters

机译：虚拟集群上大数据框架的性能分析

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Research on Big Data applications has become increasingly important for institutions and researchers worldwide. This trend is triggered by the increasingly use of systems and devices that leads to generate massive of electronic data each day. The implementation of conventional algorithms has been considered to be less efficient on managing and processing large datasets. In Big Data computation, Hadoop and Apache Spark are two open source frameworks that are commonly used and run on physical clusters. Since running these frameworks on a physical cluster costs more energy and rigid in management, in this research we evaluated their performance on virtualized clusters. Virtualization technology offers flexibility on managing cluster by sharing the resources to multiple instances. Our experiments show that in general Apache Spark is about 2-9 times better in execution time and throughput compared with Hadoop running on a virtualized environment.

机译：对于全球机构和研究人员而言，大数据应用程序的研究变得越来越重要。这种趋势是由越来越多地使用导致每天生成大量电子数据的系统和设备触发的。常规算法的实现被认为在管理和处理大型数据集方面效率较低。在大数据计算中，Hadoop和Apache Spark是两个常用的开源框架，它们在物理集群上运行。由于在物理集群上运行这些框架会花费更多的精力和严格的管理，因此在本研究中，我们评估了它们在虚拟集群上的性能。虚拟化技术通过将资源共享到多个实例，在管理集群方面提供了灵活性。我们的实验表明，与在虚拟化环境中运行的Hadoop相比，Apache Spark在执行时间和吞吐量方面通常要高出约2到9倍。

著录项

来源
《International Conference on Informatics and Computing》|2018年|1-4|共4页
会议地点
作者
Amil Ahmad Ilham; Muhammad Niswar; Andi Muhammad Ryanto;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Sparks; Throughput; Task analysis; Big Data; Benchmark testing; Virtualization; File systems;

机译：火花;吞吐量;任务分析;大数据;基准测试;虚拟化;文件系统;

相似文献

外文文献
中文文献
专利

1. A framework for organizing cancer-related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE) [J] . Amirhossein Shamsaddini, Daniel J. Crichton, Krista Smith, Database . 2014,第1期

机译：使用高性能集成虚拟环境（HIVE）从现有数据库，出版物和NGS数据整理与癌症有关的变异的框架
2. Generating insights through data preparation, visualization, and analysis: Framework for combining clustering and data visualization techniques for low-cardinality sequential data [J] . Nestorov Svetlozar, Jukic Boris, Jukic Nenad, Decision support systems . 2019,第Octa期

机译：通过数据准备，可视化和分析生成见解：结合集群和数据可视化技术以处理低基数顺序数据的框架
3. Adaptive Fuzzy Consensus Clustering Framework for Clustering Analysis of Cancer Data [J] . Yu Zhiwen, Chen Hantao, You Jane, Computational Biology and Bioinformatics, IEEE/ACM Transactions on . 2015,第4期

机译：癌症数据聚类分析的自适应模糊共识聚类框架
4. Performance Analysis of Big Data Frameworks on Virtualized Clusters [C] . Amil Ahmad Ilham, Muhammad Niswar, Andi Muhammad Ryanto International Conference on Informatics and Computing . 2018

机译：虚拟化集群大数据框架的性能分析
5. Machine Learning Model Time-Series Clustering for Energy Optimized Virtual Machine Placement Using Time-Series Performance Data [D] . Kellogg, Tad. 2021

机译：机器学习模型时间序列聚类用于使用时间序列性能数据进行能量优化虚拟机的展示
6. A framework for organizing cancer-related variations from existing databases publications and NGS data using a High-performance Integrated Virtual Environment (HIVE) [O] . Tsung-Jung Wu, Amirhossein Shamsaddini, Yang Pan, 2014

机译：使用高性能集成虚拟环境（HIVE）从现有数据库出版物和NGS数据整理与癌症有关的变异的框架
7. Performance Analysis of Network I/O Workloads in Virtualized Data Centers [O] . Yiduo Mei, Ling Liu, Senior Member, 2013

机译：虚拟数据中心中网络I / O工作负载的性能分析

Performance Analysis of Big Data Frameworks on Virtualized Clusters

摘要

著录项

相似文献

相关主题

期刊订阅