Adaptation of Double-Precision Matrix Multiplication to the Cell Broadband Engine Architecture

机译：双精度矩阵乘法对小区宽带引擎架构的适应

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents an approach to adaptation of the double-precision matrix multiplication to the architecture of Cell processors. The algorithm used for the adaptation on a single SPE is based on C = C + A*B operation performed for matrices of size 64x64; these matrices are further divided into smaller submatrices which correspond to micro-kernel operations. Our approach is based on a performance model which is constructed as a function of submatrix size. The model accounts for such factors as size of local storage, number of registers, properties of double-precision operations, balance between pipelines, etc. This approach allows us to take into consideration properties of the first generation of Cell processors and its successor - PowerXCell 8i. This adaptation is followed by an optimization phase which includes loop transformations, kernel implementation with SIMD instructions, and other transformations necessary to achieve balance between even and odd pipelines. Finally we present hand-tunings performed with the IBM Assembly Visualizer tool. The proposed adaptation and optimizations allow us to achieve about 96% of the peak performance.

机译：本文提出了一种使双精度矩阵乘法适应Cell处理器体系结构的方法。用于单个SPE的自适应算法基于对大小为64x64的矩阵执行的C = C + A * B操作;这些矩阵进一步分为较小的子矩阵，这些子矩阵与微内核操作相对应。我们的方法基于性能模型，该模型是根据子矩阵大小构建的。该模型考虑了诸如本地存储大小，寄存器数量，双精度操作的属性，流水线之间的平衡等因素。这种方法使我们能够考虑第一代Cell处理器及其后继产品PowerXCell的属性。 8i。适应之后是优化阶段，该阶段包括循环转换，使用SIMD指令的内核实现以及实现偶数和奇数流水线之间的平衡所必需的其他转换。最后，我们介绍了使用IBM Assembly Visualizer工具执行的手动调整。提出的自适应和优化功能使我们可以实现约96％的峰值性能。

著录项

来源
《International conference on parallel processing and applied mathematics;PPAM 2010》|2010年|p.535-546|共12页
会议地点
作者
Krzysztof Rojek; Lukasz Szustak;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类分布式操作系统、并行式操作系统;
关键词

相似文献

外文文献
中文文献
专利

1. Model-driven adaptation of double-precision matrix multiplication to the Cell processor architecture [J] . Roman Wyrzykowski, Krzysztof Rojek, Lukasz Szustak Parallel Computing . 2012,第4a5期

机译：模型驱动的双精度矩阵乘法对Cell处理器体系结构的适应
2. Accelerating the Execution of Matrix Languages on the Cell Broadband Engine Architecture [J] . Khoury Raymes, Burgstaller Bernd, Scholz Bernhard Parallel and Distributed Systems, IEEE Transactions on . 2011,第1期

机译：加快单元宽带引擎架构上矩阵语言的执行
3. Optimizing Matrix Multiplication For A Short-vector Simd Architecture - Cell Processor [J] . Jakub Kurzak, Wesley Alvaro, Jack Dongarra Parallel Computing . 2009,第3期

机译：优化短向量Simd架构的矩阵乘法-单元处理器
4. Adaptation of Double-Precision Matrix Multiplication to the Cell Broadband Engine Architecture [C] . Krzysztof Rojek, Lukasz Szustak International Conference on Parallel Processing and Applied Mathematics . 2010

机译：双重精度矩阵乘法对细胞宽带发动机架构的适应
5. Extracellular Matrix Architecture and Biomechanics of 3D Engineered Microtissues [D] . Bose, Prasenjit 2018

机译：3D工程微发的细胞外基质建筑和生物力学
6. Perfusion Decellularization of Extrahepatic Bile Duct Allows Tissue-Engineered Scaffold Generation by Preserving Matrix Architecture and Cytocompatibility [O] . Yolik Ramírez-Marín, David Eduardo Abad-Contreras, Martha Ustarroz-Cano, 2021

机译：侵袭性胆管导管的灌注脱细胞化允许通过保留基质架构和细胞势杂性来实现组织工程的支架产生
7. Accelerating the Execution of Matrix Languages on the Cell Broadband Engine Architecture [O] . Khoury, Raymes, Burgstaller, Bernd, Scholz, Bernhard 2009

机译：加速在小区宽带上执行矩阵语言引擎架构

Adaptation of Double-Precision Matrix Multiplication to the Cell Broadband Engine Architecture

摘要

著录项

相似文献

相关主题

期刊订阅