首页> 外文会议>IEEE International Conference on Big Data >External memory pipelining made easy with TPIE
【24h】

External memory pipelining made easy with TPIE

机译:TPIE使外部存储器流水线变得容易

获取原文

摘要

When handling large datasets that exceed the capacity of the main memory, movement of data between main memory and external memory (disk), rather than actual (CPU) computation time, is often the bottleneck in the computation. Since data is moved between disk and main memory in large contiguous blocks, this has led to the development of a large number of I/O-efficient algorithms that minimize the number of such block movements. However, actually implementing these algorithms can be somewhat of a challenge since operating systems do not give complete control over movement of blocks and management of main memory. TPIE is one of two major libraries that have been developed to support I/O-efficient algorithm implementations. It relies heavily on the fact that most I/O-efficient algorithms are naturally composed of components that stream through one or more lists of data items, while producing one or more such output lists, or components that sort such lists. Thus TPIE provides an interface where list stream processing and sorting can be implemented in a simple and modular way without having to worry about memory management or block movement. However, if care is not taken, such streaming-based implementations can lead to practically inefficient algorithms since lists of data items are typically written to (and read from) disk between components. In this paper we present a major extension of the TPIE library that includes a pipelining framework that allows for practically efficient streaming-based implementations while minimizing I/O-overhead between streaming components. The framework pipelines streaming components to avoid I/Os between components, that is, it processes several components simultaneously while passing output from one component directly to the input of the next component in main memory. TPIE automatically determines which components to pipeline and performs the required main memory management, and the extension also includes support for parallelization of internal memory computation and progress tracking across an entire application. Thus TPIE supports efficient streaming-based implementations of I/O-efficient algorithms in a simple, modular and maintainable way. The extended library has already been used to evaluate I/O-efficient algorithms in the research literature, and is heavily used in I/O-efficient commercial terrain processing applications by the Danish startup SCALGO.
机译:当处理超过主存储器容量的大型数据集时,主存储器和外部存储器(磁盘)之间的数据移动而不是实际(CPU)计算时间通常是计算的瓶颈。由于数据在磁盘和主内存之间以大的连续块形式移动,这导致了大量I / O效率高的算法的发展,这些算法使此类块移动的次数减至最少。但是,由于操作系统无法完全控制块的移动和主存储器的管理,因此实际实现这些算法可能会遇到一些挑战。 TPIE是开发用于支持I / O高效算法实现的两个主要库之一。它很大程度上依赖于这样一个事实,即大多数I / O效率高的算法自然是由流经一个或多个数据项列表的组件组成,同时生成一个或多个此类输出列表,或对此类列表进行排序的组件。因此,TPIE提供了一个界面,在其中可以以简单和模块化的方式实现列表流的处理和排序,而不必担心内存管理或块移动。但是,如果不小心,这种基于流的实现可能会导致实际上效率低下的算法,因为数据项列表通常会写入组件之间的磁盘(或从中读取)。在本文中,我们介绍了TPIE库的主要扩展,其中包括一个流水线框架,该框架允许实际有效的基于流的实现,同时最小化流组件之间的I / O开销。框架使用流水线处理组件,以避免组件之间的I / O,即,它同时处理多个组件,同时将一个组件的输出直接传递到主内存中下一个组件的输入。 TPIE自动确定要流水管理的组件并执行所需的主内存管理,该扩展还包括对内部内存计算的并行化和整个应用程序进度跟踪的支持。因此,TPIE以简单,模块化和可维护的方式支持基于流的I / O高效算法的高效实现。扩展库已经在研究文献中用于评估I / O高效算法,并且由丹麦初创公司SCALGO大量用于I / O高效商业地形处理应用程序中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号