External memory pipelining made easy with TPIE

机译：外部内存管线使TPIE变得容易

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

When handling large datasets that exceed the capacity of the main memory, movement of data between main memory and external memory (disk), rather than actual (CPU) computation time, is often the bottleneck in the computation. Since data is moved between disk and main memory in large contiguous blocks, this has led to the development of a large number of I/O-efficient algorithms that minimize the number of such block movements. However, actually implementing these algorithms can be somewhat of a challenge since operating systems do not give complete control over movement of blocks and management of main memory. TPIE is one of two major libraries that have been developed to support I/O-efficient algorithm implementations. It relies heavily on the fact that most I/O-efficient algorithms are naturally composed of components that stream through one or more lists of data items, while producing one or more such output lists, or components that sort such lists. Thus TPIE provides an interface where list stream processing and sorting can be implemented in a simple and modular way without having to worry about memory management or block movement. However, if care is not taken, such streaming-based implementations can lead to practically inefficient algorithms since lists of data items are typically written to (and read from) disk between components. In this paper we present a major extension of the TPIE library that includes a pipelining framework that allows for practically efficient streaming-based implementations while minimizing I/O-overhead between streaming components. The framework pipelines streaming components to avoid I/Os between components, that is, it processes several components simultaneously while passing output from one component directly to the input of the next component in main memory. TPIE automatically determines which components to pipeline and performs the required main memory management, and the extension also includes support for parallelization of internal memory computation and progress tracking across an entire application. Thus TPIE supports efficient streaming-based implementations of I/O-efficient algorithms in a simple, modular and maintainable way. The extended library has already been used to evaluate I/O-efficient algorithms in the research literature, and is heavily used in I/O-efficient commercial terrain processing applications by the Danish startup SCALGO.

机译：处理超过主存储器容量的大数据集时，主存储器和外部存储器（磁盘）之间的数据移动，而不是实际（CPU）计算时间，通常是计算中的瓶颈。由于数据在大型连续块中的磁盘和主存储器之间移动，因此这导致了大量I / O高效算法的开发，最小化了这种块移动的数量。然而，实际上实施这些算法可能有些挑战，因为操作系统不完全控制主存储器的块和管理的移动。 TPIE是已经开发出支持I / O高效算法实现的两个主要图书馆之一。它严重依赖于大多数I / O高效算法自然地由流通过一个或多个数据项列表的组件组成，同时产生一个或多个这样的输出列表或分类这些列表的组件。因此，TPIE提供了一种界面，其中列表流处理和排序可以以简单且模块化的方式实现，而无需担心内存管理或块移动。然而，如果不拍摄小心，则这种基于流的实现可以导致实际效率低下算法，因为数据项的列表通常被写入组件之间的（和从）磁盘之间。在本文中，我们介绍了TPIE库的主要扩展，其中包括流水线框架，其允许实际上基于流的实现，同时最小化流组件之间的I / O开销。框架流水线流传输组件以避免组件之间的I / O，即，它同时处理多个组件，同时将从一个组件直接传递到主存储器中的下一个组件的输入。 TPIe自动确定管道中的哪些组件并执行所需的主内存管理，并且扩展还包括支持内部存储器计算的并行化和整个应用程序的进度跟踪。因此，TPI以简单，模块化和可维护的方式支持I / O高效算法的高效基于流的实现。扩展库已被用于评估研究文献中的I / O高效算法，并且在丹麦启动ScaloGo中大量用于I / O高效的商业地形处理应用。

著录项

来源
《IEEE International Conference on Big Data》|2017年|685p|共6页
会议地点
作者
Lars Arge; Mathias Rav; Svend C. Svendsen; Jakob Truelsen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP393-53;
关键词
Memory management; Pipeline processing; Libraries; Software algorithms; Algorithm design and analysis; Operating systems; Hardware;

机译：内存管理;管道处理;图书馆;软件算法;算法设计和分析;操作系统;硬件;

相似文献

外文文献
中文文献
专利

1. The Sniffin’ Sticks Odor Discrimination Memory Test: A Rapid, Easy-to-Use, Reusable Procedure for Testing Olfactory Memory [J] . Gerold Besser, Leandra Jobs, David Tianxiang Liu, The Annals of otology, rhinology, and laryngology . 2019,第3期

机译：Sniffin'粘附气味辨别记忆测试：用于测试嗅觉内存的快速，易于使用，可重复使用的程序
2. Why It's Easier to Remember Seeing a Face We Already Know Than One We Don't: Preexisting Memory Representations Facilitate Memory Formation [J] . Reder L.M., Victoria L.W., Manelis A., Psychological science: a journal of the American Psychological Society . 2013,第3期

机译：为什么更容易记得看到一张我们已经知道的面孔比我们不知道的面孔：预先存在的记忆表示促进记忆形成
3. CPC (cyclic pipeline computer)-an architecture suited for Josephson and pipelined-memory machines [J] . Shimizu K., Goto E. IEEE Transactions on Computers . 1989,第6期

机译：CPC（循环管道计算机）-一种适用于约瑟夫森和管道内存机器的体系结构
4. External memory pipelining made easy with TPIE [C] . Lars Arge, Mathias Rav, Svend C. Svendsen, IEEE International Conference on Big Data . 2017

机译：TPIE使外部存储器流水线变得容易
5. Streamlining Big Data Processing Pipelines via Unix Memory Tools, Persistent Spark Datasets, and the Apache Ignite Inmemory File System [D] . Blair, Walter 2018

机译：通过Unix内存工具，持久性Spark数据集和Apache Ignite内存文件系统简化大数据处理管道
6. Correction to: SPEAQeasy: a scalable pipeline for expression analysis and quantification for R/bioconductor‑powered RNA‑seq analyses [O] . Nicholas J. Eagles, Emily E. Burke, Jacob Leonard, 2021

机译：校正：SpeaQeasy：用于表达分析和R / Biocumond-Power的RNA-SEQ分析的表达分析和定量的可伸缩管道
7. External Memory Pipelining Made Easy With TPIE [O] . Arge, Lars, Rav, Mathias, Svendsen, Svend C., 2017

机译：使用TpIE轻松实现外部存储器流水线操作

External memory pipelining made easy with TPIE

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅