首页> 外文期刊>Journal of Parallel and Distributed Computing >Data-flow analysis and optimization for data coherence in heterogeneous architectures
【24h】

Data-flow analysis and optimization for data coherence in heterogeneous architectures

机译:异构架构中数据流的分析和数据一致性优化

获取原文
获取原文并翻译 | 示例

摘要

Although heterogeneous computing has enabled developers to achieve impressive program speed-ups, the cost of moving and keeping data coherent between host and device may easily eliminate any performance gains achieved by acceleration. To deal with this problem, this paper introduces DCA: a pair of two data-flow analyses that determine how variables are used by host/device at each program point. It also introduces DCO, a code optimization technique that uses DCA information to: (a) allocate OpenCL shared buffers between host and devices; and (b) insert appropriate OpenCL function calls into program points so as to minimize the number of data coherence operations. We have used the AClang compiler to measure the impact of DCA and DCO when generating code from Parboil, Polybench and Rodinia benchmarks for a set of discrete/integrated CPUs. The experimental results showed speed-ups of up to 5.25x (average of 1.39x) on an ARM Mali-T880 and up to 8.87x (average of 1.66x) on an NVIDIA GPU Pascal Titan X. (C) 2019 Elsevier Inc. All rights reserved.
机译:尽管异构计算使开发人员能够实现令人印象深刻的程序加速,但是在主机和设备之间移动和保持数据一致的成本可能会轻易消除加速带来的任何性能提升。为了解决这个问题,本文介绍了DCA:两个数据流分析对,它们确定主机/设备在每个程序点如何使用变量。它还介绍了DCO,这是一种代码优化技术,它使用DCA信息来:(a)在主机和设备之间分配OpenCL共享缓冲区; (b)在程序点中插入适当的OpenCL函数调用,以最大程度地减少数据一致性操作的次数。当从Parboil,Polybench和Rodinia基准测试生成一组离散/集成CPU的代码时,我们已经使用AClang编译器来测量DCA和DCO的影响。实验结果表明,在ARM Mali-T880上的速度提高了5.25倍(平均1.39倍),在NVIDIA GPU Pascal Titan X上的速度提高了8.87倍(平均1.66倍)。(C)2019 Elsevier Inc.版权所有。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号