首页> 外文会议>International Conference on Network and Parallel Computing >Cogset: A unified engine for reliable storage and parallel processing
【24h】

Cogset: A unified engine for reliable storage and parallel processing

机译:CogSet:用于可靠存储和并行处理的统一发动机

获取原文

摘要

MapReduce has become a popular paradigm for parallel data processing, both for ad-hoc schema-less processing using a simple functional interface, and as a building block for higher-level abstractions. Much subsequent work has layered additional functionality on top of MapReduce or similar infrastructures, building powerful software stacks for distributed applications. In this paper, we present Cogset, the result of re-thinking the original MapReduce architecture that sits at the bottom of the stack. We observe that the traditional loose coupling between the distributed file system and the MapReduce processing engine leads to poor data locality for many applications. Accordingly, Cogset offers both reliable storage and parallel data processing, fusing the two components into a single system that ensures good data locality. We also take a new approach to data shuffling, relying on highly efficient static routing, and devise new mechanisms for fault tolerance, load balancing and ensuring consistency. We evaluate Cogset using a suite of benchmark applications, comparing it to Hadoop with very favorable results. For example, on a 12 -node cluster, an inverted index that takes 80 minutes to build using Hadoop can be constructed using Cogset in less than 35 minutes.
机译:MapReduce已成为一个流行的数据处理,用于并行数据处理,用于使用简单功能界面的较少的ad-hoc模式处理,以及作为更高级别抽象的构建块。随后的工作在MapReduce或类似的基础架构上具有分层的附加功能,为分布式应用程序构建强大的软件堆栈。在本文中,我们呈现CogSet,重新思考坐在堆栈底部的原始MapReduce架构的结果。我们观察到分布式文件系统和MapReduce处理引擎之间的传统松散耦合导致许多应用程序的数据局势差。因此,COGSET提供可靠的存储和并行数据处理,将两个组件融入到确保良好数据局部的系统中。我们还采用了一种新的数据洗牌方法,依靠高效的静态路由,并设计用于容错,负载平衡和确保一致性的新机制。我们使用一套基准应用来评估Cogset,将其与Hadoop进行比较非常有利的结果。例如,在12个-Node集群上,可以使用小于35分钟的Cogset构造使用Hadoop构建80分钟的反相索引。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号