首页> 外文期刊>Parallel and Distributed Systems, IEEE Transactions on >Accelerating the Execution of Matrix Languages on the Cell Broadband Engine Architecture
【24h】

Accelerating the Execution of Matrix Languages on the Cell Broadband Engine Architecture

机译:加快单元宽带引擎架构上矩阵语言的执行

获取原文
获取原文并翻译 | 示例

摘要

Matrix languages, including MATLAB and Octave, are established standards for applications in science and engineering. They provide interactive programming environments that are easy to use due to their script languages with matrix data types. Current implementations of matrix languages do not fully utilize high-performance, special-purpose chip architectures, such as the IBM PowerXCell processor (Cell). We present a new framework that extends Octave to harvest the computational power of the Cell. With this framework, the programmer is alleviated of the burden of introducing explicit notions of parallelism. Instead, the programmer uses a new matrix data type to execute matrix operations in parallel on the synergistic processing elements (SPEs) of the Cell. We employ lazy evaluation semantics for our new matrix data type to obtain execution traces of matrix operations. Traces are converted to data dependence graphs; operations in the data dependence graph are lowered (split into submatrices), scheduled and executed on the SPEs. Thereby, we exploit 1) data parallelism, 2) instruction level parallelism, 3) pipeline parallelism, and 4) task parallelism of matrix language programs. We conducted extensive experiments to show the validity of our approach. Our Cell-based implementation achieves speedups of up to a factor of 12 over code run on recent Intel Core2 Quad processors.
机译:包括MATLAB和Octave在内的矩阵语言是科学和工程应用中已建立的标准。它们提供具有矩阵数据类型的脚本语言,因此它们提供了易于使用的交互式编程环境。矩阵语言的当前实现没有充分利用高性能,专用芯片架构,例如IBM PowerXCell处理器(Cell)。我们提出了一个扩展Octave的新框架,以获取Cell的计算能力。通过这种框架,程序员可以减轻引入明确的并行性概念的负担。而是,程序员使用新的矩阵数据类型在Cell的协同处理元素(SPE)上并行执行矩阵运算。我们对新的矩阵数据类型采用了惰性评估语义,以获得矩阵操作的执行轨迹。迹线被转换为数据依赖图;降低数据依赖图中的操作(拆分为子矩阵),在SPE上调度和执行。因此,我们利用1)数据并行性,2)指令级并行性,3)流水线并行性和4)矩阵语言程序的任务并行性。我们进行了广泛的实验以证明我们方法的有效性。我们基于单元的实施方式使最新Intel Core2 Quad处理器上运行的代码的速度提高了12倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号