首页> 外文会议>International conference on emerging trends in information technology >Hadoop Scalability and Performance Testing in Homogeneous Clusters
【24h】

Hadoop Scalability and Performance Testing in Homogeneous Clusters

机译:Hadoop可伸缩性和均匀集群中的性能测试

获取原文

摘要

Big data is a term used to refer to the datasets that are too large (Ex. GBs, TBs, PBs, ZBs, etc.) or complex for traditional data processing application software. Distributed and parallel processing becomes increasingly important for big data. There are two most popular parallel and distributed processing frameworks available, namely Hadoop and Spark. Hadoop and Spark are open-source software frameworks for reliable, scalable, and distributed computing. Hadoop is created by Apache Software Foundation. This framework allows the processing of extremely large datasets on clusters of computers using a simple programming model called MapReduce. It works on a distributed file system called HDFS (Hadoop Distributed File System) to ran on commodity hardware. It is designed to scale up horizontally from a single machine to thousands of machines, each offering local computation and storage. Performance of Hadoop cluster depends on the application and several parameters. In this paper we aim to study the performance of Hadoop homogeneous cluster by tuning a few parameters like cluster size, dataset size, and HDFS block size, etc.
机译:大数据是用于引用太大(例如GBS,TBS,PBS,ZB等)或复杂的数据集的术语,或者用于传统数据处理应用软件。分布式和并行处理对大数据变得越来越重要。有两个最受欢迎的并行和分布式处理框架可用,即Hadoop和Spark。 Hadoop和Spark是开源软件框架,可用于可靠,可扩展和分布式计算。 Hadoop是由Apache软件基础创建的。此框架允许使用称为MapReduce的简单编程模型处理计算机集群上的极大数据集。它适用于名为HDFS(Hadoop分布式文件系统)的分布式文件系统,以耗尽商品硬件。它旨在从单台机器水平扩展到数千台机器,每个机器都提供本地计算和存储。 Hadoop集群的性能取决于应用程序和几个参数。在本文中,我们的目的是通过调整聚类大小,数据集大小和HDFS块大小等一些参数来研究Hadoop同类集群的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号