【24h】

An Investigation of Hadoop Parameters in SDN-enabled Clusters

机译:支持SDN集群中Hadoop参数的调查

获取原文

摘要

Apache Hadoop is an open source framework for distributed and parallel processing of big data jobs. It has its own distributed file system which facilitates local storage and processing on commodity hardware. Hadoop distributed file system is a core part of the Hadoop ecosystem which comprises of large number of configuration parameters. Customizing these parameters to enhance the throughput of the system, for a particular job, may require a lot of experience and skills. During the execution of a Hadoop job in a multi-node cluster, the communication among nodes takes place through switch. These switches have vendor-specific protocols to direct the flow of traffic. Software defined networking has made it possible to make networks more programmable and configurable. In this paper, we analysed the impact of Hadoop distributed file system parameters, like block size, replication factor, MapReduce parameter like number of mapper, and Hive query structure. We used faucet, an OpenFlow switch, to monitor the transfer of both packets in/out of the system to see whether network traffic information can be used to predict the impact of Hadoop parameters. We have also monitored CPU usage, disk usage, memory usage and the overall execution time during the execution of Hadoop jobs. Our investigation showed that customizing these configuration parameters of Hadoop does have an impact on network, system and execution time.
机译:Apache Hadoop是大数据作业的分布式和并行处理的开源框架。它有自己的分布式文件系统,促进了本地存储和处理商品硬件。 Hadoop分布式文件系统是Hadoop生态系统的核心部分,包括大量配置参数。自定义这些参数以增强系统的吞吐量,对于特定工作,可能需要大量的经验和技能。在多节点集群中执行Hadoop作业期间,节点之间的通信通过开关进行。这些交换机具有特定于供应商的协议,可引导流量流。软件定义的网络使得可以使网络更可编程和可配置。在本文中,我们分析了Hadoop分布式文件系统参数的影响,如块大小,复制因子,MapReduce参数,如映射器数量和Hive查询结构。我们使用了龙头,一个OpenFlow开关,监视两个数据包的传输/脱离系统,看看网络流量信息是否可用于预测Hadoop参数的影响。在执行Hadoop作业期间,我们还监视了CPU使用率,磁盘使用情况,内存使用率和整体执行时间。我们的调查显示,定制Hadoop的这些配置参数确实对网络,系统和执行时间产生了影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号