首页> 外文会议>International Conference on Communication Systems and Network Technologies >A Performance Analysis of MapReduce Task with Large Number of Files Dataset in Big Data Using Hadoop

【24h】

A Performance Analysis of MapReduce Task with Large Number of Files Dataset in Big Data Using Hadoop

机译：使用Hadoop在大数据中包含大量文件数据集的MapReduce任务的性能分析

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Big Data is a huge amount of data that cannot be managed by the traditional data management system. Hadoop is a technological answer to Big Data. Hadoop Distributed File System (HDFS) and MapReduce programming model is used for storage and retrieval of the big data. The Tera Bytes size file can be easily stored on the HDFS and can be analyzed with MapReduce. This paper provides introduction to Hadoop HDFS and MapReduce for storing large number of files and retrieve information from these files. In this paper we present our experimental work done on Hadoop by applying a number of files as input to the system and then analyzing the performance of the Hadoop system. We have studied the amount of bytes written and read by the system and by the MapReduce. We have analyzed the behavior of the map method and the reduce method with increasing number of files and the amount of bytes written and read by these tasks.

机译：大数据是无法由传统数据管理系统管理的大量数据。 Hadoop是大数据的技术解决方案。 Hadoop分布式文件系统（HDFS）和MapReduce编程模型用于存储和检索大数据。 Tera Bytes大小的文件可以轻松存储在HDFS上，并可以使用MapReduce进行分析。本文介绍了Hadoop HDFS和MapReduce，它们用于存储大量文件并从这些文件中检索信息。在本文中，我们通过将大量文件用作系统输入，然后分析Hadoop系统的性能，介绍了在Hadoop上完成的实验工作。我们研究了系统和MapReduce写入和读取的字节数。我们分析了map方法和reduce方法的行为，这些方法随着文件数量的增加以及这些任务写入和读取的字节数的增加。

著录项

来源
《International Conference on Communication Systems and Network Technologies 》|2014年|587-591|共5页
会议地点
作者
Pal Amrit; Jain Kunal; Agrawal Pinki; Agrawal Sanjay;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Data Node; HDFS; Hadoop; Job Tracker; MapReduce; Name Node; Secondary Name Node; Task Tracker; Teragen; Terasort; Teravalidate;

机译：数据节点; HDFS; Hadoop;作业跟踪器; MapReduce;名称节点;次要名称节点;任务跟踪器; Teragen; Terasort; Teravalidate;

相似文献

外文文献
中文文献
专利

1. High Performance Computation of Big Data: Performance Optimization Approach towards a Parallel Frequent Item Set Mining Algorithm for Transaction Data based on Hadoop MapReduce Framework [J] . Guru Prasad M S, Nagesh H R, Swathi Prabhu International Journal of Intelligent Systems and Applications . 2017 ,第1期

机译：大数据的高性能计算：基于Hadoop MapReduce框架的事务数据并行频繁项集挖掘算法的性能优化方法
2. Big Data Performance Analysis on a Hadoop Distributed File System Based on Geometric Data Perturbation Technique [J] . V. Santhana Marichamy, V. Natarajan Procedia Computer Science . 2019 ,第5期

机译：基于几何数据扰动技术的Hadoop分布式文件系统大数据性能分析
3. Task failure resilience technique for improving the performance of MapReduce in Hadoop [J] . Kavitha C, Anita X ETRI journal . 2020 ,第5期

机译：提高Hadoop中MapReduce性能的任务故障恢复技术
4. A Performance Analysis of MapReduce Task with Large Number of Files Dataset in Big Data Using Hadoop [C] . Pal Amrit, Jain Kunal, Agrawal Pinki, International Conference on Communication Systems and Network Technologies . 2014

机译：使用Hadoop中大数据中大量文件数据集的MapReduce任务的性能分析
5. Improving Hadoop performance by using metadata of related jobs in text datasets via enhancing MapReduce workflow. [D] . Alshammari, Hamoud. 2016

机译：通过增强MapReduce工作流程，在文本数据集中使用相关作业的元数据来提高Hadoop性能。
6. Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce [O] . Ablimit Aji, Fusheng Wang, Hoang Vo, -1

机译：Hadoop-GIS：基于MapReduce的高性能空间数据仓库系统
7. Improving Hadoop Performance by Using Metadata of Related Jobs in Text Datasets Via Enhancing MapReduce Workflow [O] . Alshammari Hamoud H. 2016

机译：通过增强mapReduce工作流，在文本数据集中使用相关作业的元数据来提高Hadoop性能

A Performance Analysis of MapReduce Task with Large Number of Files Dataset in Big Data Using Hadoop

摘要

著录项

相似文献

相关主题

期刊订阅