首页> 外文会议>International Conference on Applied and Theoretical Computing and Communication Technology >A Big Data MapReduce Hadoop distribution architecture for processing input splits to solve the small data problem
【24h】

A Big Data MapReduce Hadoop distribution architecture for processing input splits to solve the small data problem

机译:大数据MapReduce Hadoop分发架构,用于处理输入拆分以解决小数据问题

获取原文
获取外文期刊封面目录资料

摘要

Hadoop deals with big data which is an open source Java framework. There are two core components in it namely: HDFS (Hadoop distributed file system) is the ability of a system to continue normal operation against hardware or software faults using inexpensive hardware and which stocks huge extent of data another one is MapReduce is a processing technique and programming model done in lateral and scattered manner. Hadoop does not perform well for short data because huge amount of short data could be greater task on the NameNode of HDFS which inturn its execution time is prolonged for which MapReduce is encountered. While dealing with great amount of short data as it is particularly designed to handle huge amount of data, hadoop experienced with a performance cost. This analysis permits the indetail description of HDFS, actual ways to deal with the problems along with proposed approach to handle short data files and short data file problems. In proposed approach, small files are merged using programming model on hadoop known as MapReduce. By this approach of Hadoop performance of handling small files which is larger than block size is improved. We also propose a Traffic analyzer with the combination of Hadoop and Map-Reduce paradigm. The joint of Hadoop and MapReduce programming tools makes it possible to provide batch analysis in minimum response time and in memory computing capacity in order to process log in a high available, efficient and stable way.
机译:Hadoop处理作为开放源Java框架的大数据。其中有两个核心组件:HDFS(Hadoop分布式文件系统)是系统使用廉价的硬件来继续针对硬件或软件故障进行正常操作的能力,并且它可以存储大量数据,另一种是MapReduce是一种处理技术,横向和分散方式完成的编程模型。 Hadoop对于短数据的性能不佳,因为在HDFS的NameNode上,大量的短数据可能是更大的任务,这反过来会延长其执行时间(遇到MapReduce)。在处理大量短数据时(特别是为处理大量数据而设计的),Hadoop具有性能成本。这种分析允许对HDFS进行详细描述,处理问题的实际方法以及处理短数据文件和短数据文件问题的建议方法。在所提出的方法中,小文件是使用hadoop上的编程模型MapReduce合并的。通过Hadoop的这种方法,提高了处理大于块大小的小文件的性能。我们还提出了结合Hadoop和Map-Reduce范式的流量分析器。 Hadoop和MapReduce编程工具的结合使以最小的响应时间和内存计算能力提供批处理分析成为可能,从而以高可用性,高效和稳定的方式处理日志。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号