【24h】

Array streaming for array programming

机译:数组编程阵列流

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

A barrier to efficient array programming, for example in Python/NumPy, is that algorithms written as pure array operations completely without loops, while most efficient on small input, can lead to explosions in memory use. The present paper presents a solution to this problem using array streaming , implemented in the automatic parallelisation high-performance framework Bohrium . This makes it possible to use array programming in Python/NumPy code directly, even when the apparent memory requirement exceeds the machine capacity, since the automatic streaming eliminates the temporary memory overhead by performing calculations in per-thread registers. Using Bohrium, we automatically fuse, stream, JIT-compile, and execute NumPy array operations on GPGPUs without modification to the user programs. We present performance evaluations of three benchmarks, all of which show dramatic reductions in memory use from streaming, yielding corresponding improvements in speed and utilisation of GPGPU-cores. The fusion step is implemented using the theoretical framework presented in Kristensen et al. (2016), using a streaming-maximising cost function. The streaming-enabled Bohrium effortlessly runs programs on input sizes several orders of magnitude beyond sizes that crash on pure NumPy due to exhausting system memory.
机译:例如在Python / Numpy中的有效阵列编程的障碍是写入纯阵列操作的算法完全没有循环,而在小输入上最有效,可能导致内存使用中的爆炸。本文介绍了使用阵列流的解决方案,在自动平行高性能框架波西米中实现。这使得可以直接在Python / Numpy代码中使用阵列编程,即使当表观内存要求超过机器容量,因为自动流通过在每线程寄存器中执行计算来消除临时存储器开销。使用波西米,我们自动熔断,流,JIT编译和在GPGPU上执行Numpy数组操作,而不修改用户程序。我们呈现了三个基准的绩效评估,所有这些都显示出血流流动的剧烈减少,从而产生GPGPU-Cores的速度和利用率的相应改进。使用Kristensen等人的理论框架来实现融合步骤。 (2016),使用流式最大化成本函数。启用流的波西米毫不费力地在输入大小上运行程序,超过尺寸的数量级,因为耗尽的系统内存而在纯Numpy上崩溃。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号