首页> 外文期刊>Journal of Parallel and Distributed Computing >Data-flow analysis and optimization for data coherence in heterogeneous architectures
【24h】

Data-flow analysis and optimization for data coherence in heterogeneous architectures

机译:异构架构中数据一致性的数据流分析与优化

获取原文
获取原文并翻译 | 示例

摘要

Although heterogeneous computing has enabled developers to achieve impressive program speed-ups, the cost of moving and keeping data coherent between host and device may easily eliminate any performance gains achieved by acceleration. To deal with this problem, this paper introduces DCA: a pair of two data-flow analyses that determine how variables are used by host/device at each program point. It also introduces DCO, a code optimization technique that uses DCA information to: (a) allocate OpenCL shared buffers between host and devices; and (b) insert appropriate OpenCL function calls into program points so as to minimize the number of data coherence operations. We have used the AClang compiler to measure the impact of DCA and DCO when generating code from Parboil, Polybench and Rodinia benchmarks for a set of discrete/integrated CPUs. The experimental results showed speed-ups of up to 5.25x (average of 1.39x) on an ARM Mali-T880 and up to 8.87x (average of 1.66x) on an NVIDIA GPU Pascal Titan X. (C) 2019 Elsevier Inc. All rights reserved.
机译:尽管异构计算使开发人员能够实现令人印象深刻的程序加速,但是在主机和设备之间移动和保持数据相干的成本可能很容易消除通过加速度实现的任何性能增益。要处理此问题,本文介绍了DCA:一对数据流分析,确定每个程序点的主机/设备如何使用变量。它还介绍了DCO,一种代码优化技术,它使用DCA信息:(a)在主机和设备之间分配OpenCL共享缓冲区; (b)将适当的OpenCL函数调用插入程序点,以最小化数据相干操作的数量。我们使用了ACLANG编译器来测量DCA和DCO的影响,当一组离散/集成CPU的帕押,PolyBench和Rodinia基准测试代码时。实验结果表明,在NVIDIA GPU Pascal Titan X.(C)2019年Elsevier Inc.的速度下,速度高达5.25倍(平均1.39倍),高达8.87倍(平均为1.66倍)版权所有。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号