A Performance Analysis of MapReduce Task with Large Number of Files Dataset in Big Data Using Hadoop

机译：使用Hadoop中大数据中大量文件数据集的MapReduce任务的性能分析

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Big Data is a huge amount of data that cannot be managed by the traditional data management system. Hadoop is a technological answer to Big Data. Hadoop Distributed File System (HDFS) and MapReduce programming model is used for storage and retrieval of the big data. The Tera Bytes size file can be easily stored on the HDFS and can be analyzed with MapReduce. This paper provides introduction to Hadoop HDFS and MapReduce for storing large number of files and retrieve information from these files. In this paper we present our experimental work done on Hadoop by applying a number of files as input to the system and then analyzing the performance of the Hadoop system. We have studied the amount of bytes written and read by the system and by the MapReduce. We have analyzed the behavior of the map method and the reduce method with increasing number of files and the amount of bytes written and read by these tasks.

机译：大数据是传统数据管理系统无法管理的大量数据。 Hadoop是大数据的技术答案。 Hadoop分布式文件系统（HDFS）和MapReduce编程模型用于存储和检索大数据。 Tera字节大小文件可以轻松存储在HDF上，可以使用MapReduce进行分析。本文为Hadoop HDFS和MapReduce介绍了存储大量文件并从这些文件中检索信息。在本文中，我们在Hadoop通过将许多文件应用于系统的文件然后分析Hadoop系统的性能来展示我们的实验工作。我们已经研究了系统编写和读写的字节数量和MapReduce。我们已经分析了MAP方法的行为和越来越多的文件数量和由这些任务写入和读取的字节量的减少方法。

著录项

来源
《International Conference on Communication Systems and Network Technologies》|2014年||共5页
会议地点
作者
Pal Amrit; Jain Kunal; Agrawal Pinki; Agrawal Sanjay;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类通信;
关键词
Data Node; HDFS; Hadoop; Job Tracker; MapReduce; Name Node; Secondary Name Node; Task Tracker; Teragen; Terasort; Teravalidate;

机译：数据节点;hdfs;hadoop;作业跟踪器;mapreduce;name节点;exosis name节点;任务跟踪器;teragen;terasort;terasort;terasort;teravalate;

相似文献

外文文献
中文文献
专利

1. High Performance Computation of Big Data: Performance Optimization Approach towards a Parallel Frequent Item Set Mining Algorithm for Transaction Data based on Hadoop MapReduce Framework [J] . Guru Prasad M S, Nagesh H R, Swathi Prabhu International Journal of Intelligent Systems and Applications . 2017,第1期

机译：大数据的高性能计算：基于Hadoop MapReduce框架的事务数据并行频繁项集挖掘算法的性能优化方法
2. Big Data Performance Analysis on a Hadoop Distributed File System Based on Geometric Data Perturbation Technique [J] . V. Santhana Marichamy, V. Natarajan Procedia Computer Science . 2019,第5期

机译：基于几何数据扰动技术的Hadoop分布式文件系统大数据性能分析
3. Task failure resilience technique for improving the performance of MapReduce in Hadoop [J] . Kavitha C, Anita X ETRI journal . 2020,第5期

机译：提高Hadoop中MapReduce性能的任务故障恢复技术
4. A Performance Analysis of MapReduce Task with Large Number of Files Dataset in Big Data Using Hadoop [C] . Pal Amrit, Jain Kunal, Agrawal Pinki, International Conference on Communication Systems and Network Technologies . 2014

机译：使用Hadoop在大数据中包含大量文件数据集的MapReduce任务的性能分析
5. Improving Hadoop performance by using metadata of related jobs in text datasets via enhancing MapReduce workflow. [D] . Alshammari, Hamoud. 2016

机译：通过增强MapReduce工作流程，在文本数据集中使用相关作业的元数据来提高Hadoop性能。
6. Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce [O] . Ablimit Aji, Fusheng Wang, Hoang Vo, -1

机译：Hadoop-GIS：基于MapReduce的高性能空间数据仓库系统
7. Improving Hadoop Performance by Using Metadata of Related Jobs in Text Datasets Via Enhancing MapReduce Workflow [O] . Alshammari Hamoud H. 2016

机译：通过增强mapReduce工作流，在文本数据集中使用相关作业的元数据来提高Hadoop性能

A Performance Analysis of MapReduce Task with Large Number of Files Dataset in Big Data Using Hadoop

摘要

著录项

相似文献

相关主题

期刊订阅