Data layout transformation through in-place transposition.

机译：通过就地换位进行数据布局转换。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Matrix transposition is an important algorithmic building block for many numeric algorithms like multidimensional FFT. It has also been used to convert the storage layout of arrays. Intuitively, in-place transposition should be a good fit for GPU architectures due to limited available on-board memory capacity and high throughput. However, direct application of in-place transposition algorithms from CPU lacks the amount of parallelism and locality required by GPU to achieve good performance.;In this thesis we present the first known in-place matrix transposition approach for the GPUs. Our implementation is based on a staged transposition algorithm where each stage is performed using an elementary tiled-wise transposition. With both low-level optimizations to the elementary tiled-wise transpositions as well as high-level improvements to existing staged transposition algorithm, our design is able to reach more than 20GB/s sustained throughput on modern GPUs, and a 3X speedup.;Furthermore, for many-core architectures like the GPUs, efficient off-chip memory access is crucial to high performance; the applications are often limited by off-chip memory bandwidth. Transforming data layout is an effective way to reshape the access patterns to improve off-chip memory access behavior, but several challenges had limited the use of automated data layout transformation systems on GPUs, namely how to efficiently handle arrays of aggregates, and transparently marshal data between layouts required by different performance sensitive kernels and legacy host code. While GPUs have higher memory bandwidth and are natural candidates for marshaling data between layouts, the relatively constrained GPU memory capacity, compared to that of the CPU, implies that not only the temporal cost of marshaling but also the spatial overhead must be considered for any practical layout transformation systems.;As an application of the in-place transposition methodology, a novel approach to laying out arrays of aggregate types across GPU and CPU architectures is proposed to further improve memory parallelism and kernel performance beyond what is achieved by human programmers using discrete arrays today.;Second, the system, DL, has a run-time library implemented in OpenCL that transparently and efficiently converts, or marshals, data to accommodate application components that have different data layout requirements. We present insights that lead to the design of this highly efficient run-time marshaling library. Third, we show experimental results that the new layout approach leads to substantial performance improvement at the applications level even when all marshaling cost is taken into account.

机译：对于许多数字算法（例如多维FFT），矩阵转置是重要的算法构建块。它也已用于转换阵列的存储布局。凭直觉，由于板载可用内存有限和高吞吐量，就地换位应非常适合GPU架构。然而，直接从CPU中应用就地置换算法缺乏GPU达到良好性能所需的并行性和局部性。在本文中，我们提出了第一个已知的GPU就地矩阵置换方法。我们的实现基于分阶段转置算法，其中每个阶段都使用基本的平铺转置进行。通过对基本平铺转置的低级优化以及对现有分段转置算法的高级改进，我们的设计能够在现代GPU上实现超过20GB / s的持续吞吐量，并实现3倍的加速。对于GPU等许多核心架构，高效的片外内存访问对高性能至关重要。应用通常受到片外存储器带宽的限制。转换数据布局是重塑访问模式以改善片外内存访问行为的有效方法，但是一些挑战限制了在GPU上使用自动数据布局转换系统，即如何有效处理聚合数组以及透明地封送数据在不同的性能敏感内核所需的布局与旧版主机代码之间进行选择。尽管GPU具有更高的内存带宽并且是在布局之间封送数据的自然候选者，但与CPU相比，GPU内存容量相对受限制，这意味着对于任何实际应用，不仅必须考虑封送的时间成本，而且还必须考虑空间开销布局转换系统。作为就地换位方法的应用，提出了一种新颖的方法来跨GPU和CPU架构布置聚合类型的数组，以进一步提高内存并行性和内核性能，这超出了人类程序员使用离散量实现的水平。第二，系统DL具有在OpenCL中实现的运行时库，该库可以透明有效地转换或封送数据，以适应具有不同数据布局要求的应用程序组件。我们提供了一些见识，从而可以设计出这种高效的运行时编组库。第三，我们显示了实验结果，即使考虑了所有封送处理成本，新的布局方法仍可以在应用程序级别上显着提高性能。

著录项

作者
Sung, I-Jui.;
展开▼
作者单位

University of Illinois at Urbana-Champaign.;

展开▼
授予单位 University of Illinois at Urbana-Champaign.;
学科 Engineering Computer.
学位 Ph.D.
年度 2013
页码 118 p.
总页数 118
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Late Data Layout: Unifying Data Representation Transformations [J] . Ureche Vlad, Burmako Eugene, Odersky Martin ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2014,第10期

机译：后期数据布局：统一数据表示形式的转换
2. Late Data Layout: Unifying Data Representation Transformations [J] . Ureche Vlad, Burmako Eugene, Odersky Martin ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2014,第10期

机译：后期数据布局：统一数据表示形式的转换
3. Automatic Data Layout Transformations in the ExaStencils Code Generator [J] . Stefan Kronawitter, Sebastian Kuckuk, Harald Kostler, Parallel Processing Letters . 2018,第3期

机译：exastcilenss代码生成器中的自动数据布局转换
4. Investigating Data Layout Transformations in Chapel [C] . Apan Qasem, Ashwin M. Aji, Michael L. Chu IEEE International Parallel and Distributed Processing Symposium Workshops . 2018

机译：调查教堂中的数据布局转换
5. THE MULTI-LINGUAL DATABASE SYSTEM - A PARADIGM AND TEST-BED FOR THE INVESTIGATION OF DATA-MODEL TRANSFORMATIONS, DATA-LANGUAGE TRANSLATIONS AND DATA-MODEL SEMANTICS. [D] . DEMURJIAN, STEVEN ARTHUR. 1987

机译：多语言数据库系统-一种用于研究数据模型转换，数据语言翻译和数据模型语义的范例和测试平台。
6. The influence of collection method on paleoecological datasets: In-place versus surface-collected fossil samples in the Pennsylvanian Finis Shale Texas USA [O] . Frank L. Forcino, Emily S. Stafford 2020

机译：采集方法对古生态数据集的影响：美国得克萨斯州宾夕法尼亚州菲尼斯页岩的就地采样与表面采集化石样品
7. Rewriting System for Profile-Guided Data Layout Transformations on Binaries [O] . Aumage, Olivier, Haine, Christopher, Barthou, Denis 2017

机译：用于二进制文件的配置文件引导的数据布局转换的重写系统

Data layout transformation through in-place transposition.

摘要

著录项

相似文献

相关主题

期刊订阅