首页> 外文会议>International Conference on Communication Systems and Network Technologies >A Performance Analysis of MapReduce Task with Large Number of Files Dataset in Big Data Using Hadoop
【24h】

A Performance Analysis of MapReduce Task with Large Number of Files Dataset in Big Data Using Hadoop

机译:使用Hadoop中大数据中大量文件数据集的MapReduce任务的性能分析

获取原文

摘要

Big Data is a huge amount of data that cannot be managed by the traditional data management system. Hadoop is a technological answer to Big Data. Hadoop Distributed File System (HDFS) and MapReduce programming model is used for storage and retrieval of the big data. The Tera Bytes size file can be easily stored on the HDFS and can be analyzed with MapReduce. This paper provides introduction to Hadoop HDFS and MapReduce for storing large number of files and retrieve information from these files. In this paper we present our experimental work done on Hadoop by applying a number of files as input to the system and then analyzing the performance of the Hadoop system. We have studied the amount of bytes written and read by the system and by the MapReduce. We have analyzed the behavior of the map method and the reduce method with increasing number of files and the amount of bytes written and read by these tasks.
机译:大数据是传统数据管理系统无法管理的大量数据。 Hadoop是大数据的技术答案。 Hadoop分布式文件系统(HDFS)和MapReduce编程模型用于存储和检索大数据。 Tera字节大小文件可以轻松存储在HDF上,可以使用MapReduce进行分析。本文为Hadoop HDFS和MapReduce介绍了存储大量文件并从这些文件中检索信息。在本文中,我们在Hadoop通过将许多文件应用于系统的文件然后分析Hadoop系统的性能来展示我们的实验工作。我们已经研究了系统编写和读写的字节数量和MapReduce。我们已经分析了MAP方法的行为和越来越多的文件数量和由这些任务写入和读取的字节量的减少方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号