【24h】

Data Centric Text Processing Using MapReduce

机译:使用mapReduce的数据中心文本处理

获取原文

摘要

Processing huge volume of data opened new opportunities in ecom-merce, engineering, business and large computing applications. MapReduce programming model is a parallel data processing approach for execution on computer clusters. This model provides an abstraction to design scalable computing algorithm for big data processing. For batch processing types of data processing, MapReduce model provides faster computation. The key/value pair generation of MapReduce program creates memory overhead and deserialization overhead due to data redundancy. Redundancy of data is one of the most important factors that consumes space and affect system performance while using large set of data. This overhead can be avoided considerably by using a novel approach that we developed named Data Triggered Multithreaded Programming (DTMP) model. In this paper, we demonstrate the use of DTMP model using a large dataset with author details and his publications. The Data Triggered Multithreaded Programming can dynamically allocate the resources and can identify the data repetition occurring during computation. DTMP model when applied to the MapReduce programming model brings performance improvement to the system. The major contributions of this work are a simple, scalable and powerful processing of text data that enables automatic parallelization and distribution of large-scale computations.
机译:处理大量数据在ECOM-Merce,Engineering,Business和Sight Computing应用程序中开辟了新的机会。 MapReduce编程模型是一种并行数据处理方法,可在计算机集群上执行。该模型提供了一种为大数据处理设计可伸缩计算算法的抽象。对于批处理数据处理类型,MapReduce模型提供更快的计算。 MapReduce程序的键/值对生成由于数据冗余而产生内存开销和反序列化开销。数据的冗余是消耗空间的最重要因素之一,并在使用大集数据时影响系统性能。可以通过使用我们开发的数据触发多线程编程(DTMP)模型的新颖方法来避免这种开销。在本文中,我们使用具有作者详细信息及其出版物的大型数据集来证明使用DTMP模型。数据触发的多线程编程可以动态分配资源,并可以识别在计算期间发生的数据重复。 DTMP模型应用于MapReduce编程模型,为系统带来了性能改进。这项工作的主要贡献是一种简单,可扩展,强大的文本数据处理,可以自动并行化和分布大规模计算。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号