【24h】

Watershed reengineering: Making Streams Programmable

机译:分水岭再造:使溪流可编程

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Most high-performance data processing (aka big-data) systems allow users to express their computation using abstractions (like map-reduce) that simplify the extraction of parallelism from applications. Most frameworks, however, do not allow users to specify how communication must take place: that element is deeply embedded into the run-time system (RTS), making changes hard to implement. In this work we describe our reengineering of the Watershed system, a framework based on the filter-stream paradigm and focused on continuous stream processing. Like other big-data environments, watershed provided object-oriented abstractions to express computation (filters), but the implementation of streams was an RTS element. By isolating stream functionality into appropriate classes, combination of communication patterns and reuse of common message handling functions (like compression and blocking) become possible. The new architecture even allow the design of new communication patterns, for example, allowing users to choose MPI, TCP or shared memory implementations of communication channels as their problem demand. Applications designed for the new interface showed reductions in code size on the order of 50%and above in some cases, with no significant performance penalty.
机译:大多数高性能数据处理(又名大数据)系统都允许用户使用抽象(如map-reduce)来表达他们的计算,这些抽象简化了从应用程序中提取并行性。但是,大多数框架不允许用户指定通信的方式:该元素已深深嵌入到运行时系统(RTS)中,从而使更改难以实现。在这项工作中,我们描述了分水岭系统的重新设计,该分水岭系统是基于过滤流范例并专注于连续流处理的框架。与其他大数据环境一样,分水岭提供了面向对象的抽象来表示计算(过滤器),但是流的实现是一个RTS元素。通过将流功能隔离到适当的类中,可以实现通信模式的组合以及常见消息处理功能(如压缩和阻塞)的重用。新的体系结构甚至允许设计新的通信模式,例如,允许用户根据自己的问题选择通信通道的MPI,TCP或共享内存实现。为新接口设计的应用程序在某些情况下显示代码大小减少了50%或更多,并且没有明显的性能损失。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号