首页> 外文期刊>ACM transactions on mathematical software >A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures
【24h】

A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures

机译:在多线程体系结构上按层对核心矩阵算法进行编程的运行时系统

获取原文
获取原文并翻译 | 示例

摘要

Out-of-core implementations of algorithms for dense matrix computations have traditionally focused on optimal use of memory so as to minimize I/O, often trading programmability for performance. In this article we show how the current state of hardware and software allows the programmability problem to be addressed without sacrificing performance. This comes from the realizations that memory is cheap and large, making it less necessary to optimally orchestrate I/O, and that new algorithms view matrices as collections of submatrices and computation as operations with those submatrices. This enables libraries to be coded at a high level of abstraction, leaving the tasks of scheduling the computations and data movement in the hands of a runtime system. This is in sharp contrast to more traditional approaches that leverage optimal use of in-core memory and, at the expense of introducing considerable programming complexity, explicit overlap of I/O with computation. Performance is demonstrated for this approach on multicore architectures as well as platforms equipped with hardware accelerators.
机译:传统上,用于密集矩阵计算的算法的核外实现通常集中在内存的最佳使用上,以使I / O最小化,通常是为了性能而牺牲可编程性。在本文中,我们展示了硬件和软件的当前状态如何在不牺牲性能的情况下解决了可编程性问题。这是由于人们认识到内存便宜又大,因此无需最佳地编排I / O,并且新算法将矩阵视为子矩阵的集合,并将计算视为对这些子矩阵的操作。这使库可以以较高的抽象级别进行编码,而将调度计算和数据移动的任务交给了运行时系统。这与更传统的方法形成了鲜明的对比,这些方法利用了内核内存的最佳利用,并且以引入大量编程复杂性为代价,使I / O与计算明显重叠。这种方法在多核体系结构以及配备了硬件加速器的平台上的性能得到了证明。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号