首页> 外文会议>International Conference on Communication Systems and Network Technologies >A Performance Analysis of MapReduce Task with Large Number of Files Dataset in Big Data Using Hadoop
【24h】

A Performance Analysis of MapReduce Task with Large Number of Files Dataset in Big Data Using Hadoop

机译:使用Hadoop在大数据中包含大量文件数据集的MapReduce任务的性能分析

获取原文

摘要

Big Data is a huge amount of data that cannot be managed by the traditional data management system. Hadoop is a technological answer to Big Data. Hadoop Distributed File System (HDFS) and MapReduce programming model is used for storage and retrieval of the big data. The Tera Bytes size file can be easily stored on the HDFS and can be analyzed with MapReduce. This paper provides introduction to Hadoop HDFS and MapReduce for storing large number of files and retrieve information from these files. In this paper we present our experimental work done on Hadoop by applying a number of files as input to the system and then analyzing the performance of the Hadoop system. We have studied the amount of bytes written and read by the system and by the MapReduce. We have analyzed the behavior of the map method and the reduce method with increasing number of files and the amount of bytes written and read by these tasks.
机译:大数据是无法由传统数据管理系统管理的大量数据。 Hadoop是大数据的技术解决方案。 Hadoop分布式文件系统(HDFS)和MapReduce编程模型用于存储和检索大数据。 Tera Bytes大小的文件可以轻松存储在HDFS上,并可以使用MapReduce进行分析。本文介绍了Hadoop HDFS和MapReduce,它们用于存储大量文件并从这些文件中检索信息。在本文中,我们通过将大量文件用作系统输入,然后分析Hadoop系统的性能,介绍了在Hadoop上完成的实验工作。我们研究了系统和MapReduce写入和读取的字节数。我们分析了map方法和reduce方法的行为,这些方法随着文件数量的增加以及这些任务写入和读取的字节数的增加。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号