首页> 外文会议>International conference on Euro-Par >Automated Transformation of GPU-Specific OpenCL Kernels Targeting Performance Portability on Multi-Core/Many-Core CPUs

【24h】

Automated Transformation of GPU-Specific OpenCL Kernels Targeting Performance Portability on Multi-Core/Many-Core CPUs

机译：针对多核/多核CPU的性能可移植性的GPU专用OpenCL内核的自动转换

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

When adapting GPU-specific OpenCL kernels to run on multi-core/many-core CPUs, coarsening the thread granularity is necessary and thus extensively used. However, locality concerns exposed in GPU-specific OpenCL code are usually inherited without analysis, which may give side-effects on the CPU performance. When executing GPU-specific kernels on CPUs, local-memory arrays no longer match well with the hardware and the associated synchronizations are costly. To solve this dilemma, we actively analyze the memory access patterns by using array-access descriptors derived from GPU-specific kernels, which can thus be adapted for CPUs by removing all the unwanted local-memory arrays together with the obsolete barrier statements. Experiments show that the automated transformation can satisfactorily improve OpenCL kernel performances on Sandy Bridge CPU and Intel's Many-Integrated-Core coprocessor.

机译：当使GPU特定的OpenCL内核适应于在多核/多核CPU上运行时，必须粗化线程粒度，因此被广泛使用。但是，GPU特定的OpenCL代码中暴露的局部性问题通常无需分析即可继承，这可能会给CPU性能带来副作用。在CPU上执行GPU特定的内核时，本地内存阵列不再与硬件匹配，并且相关联的同步操作成本很高。为了解决这个难题，我们使用从GPU特定内核派生的数组访问描述符来主动分析内存访问模式，因此可以通过删除所有不需要的本地内存数组以及过时的barrier语句来将其适配于CPU。实验表明，自动转换可以在Sandy Bridge CPU和Intel的Many-Integrated-Core协处理器上令人满意地提高OpenCL内核性能。

著录项

来源
《International conference on Euro-Par》|2014年|210-221|共12页
会议地点
作者
Dafei Huang; Mei Wen; Changqing Xun; Dong Chen; Xing Cai; Yuran Qiao; Nan Wu; Chunyuan Zhang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
OpenCL; Performance portability; Multi-core/many-core CPU; Code transformation and optimization;

机译：OpenCL;性能可移植性;多核/多核CPU;代码转换和优化;

相似文献

外文文献
中文文献
专利

1. Improving performance portability for GPU-specific OpenCL kernels on multi-core/many-core CPUs by analysis-based transformations [J] . Mei?Wen, Da-fei?Huang, Chang-qing?Xun, Frontiers of Information Technology & Electronic Engineering . 2015,第11期

机译：通过基于分析的转换，提高多核/多核CPU上特定于GPU的OpenCL内核的性能可移植性
2. Improving performance portability for GPU-specific OpenCL kernels on multi-core/many-core CPUs by analysis-based transformations*# [J] . Mei WEN, Da-fei HUANG, Chang-qing XUN, 浙江大学学报（英文版）（C辑：计算机与电子） . 2015,第011期

机译：通过基于分析的转换来提高多核/多核CPU上特定于GPU的OpenCL内核的性能可移植性*＃
3. Assessing the Performance and Energy Usage of Multi-CPUs, Multi-Core and Many-Core Systems : The MMP Image Encoder Case Study [J] . Pedro M.M. Pereira, Patricio Domingues, Nuno M. M. Rodrigues, International Journal of Distributed and Parallel Systems . 2016,第5期

机译：评估多CPU，多核和多核系统的性能和能耗：MMP图像编码器案例研究
4. Automated Transformation of GPU-Specific OpenCL Kernels Targeting Performance Portability on Multi-Core/Many-Core CPUs [C] . Dafei Huang, Mei Wen, Changqing Xun, International Euro-Par Conference . 2014

机译：GPU特定OpenCL内核的自动转换针对多核/许多核心CPU上的性能便携性
5. Parallelization framework for scientific application kernels on multi-core/many-core platforms. [D] . Peng, Liu. 2011

机译：多核/多核平台上科学应用程序内核的并行化框架。
6. Application Performance Analysis and Efficient Execution on Systems with multi-core CPUs GPUs and MICs: A Case Study with Microscopy Image Analysis [O] . George Teodoro, Tahsin Kurc, Guilherme Andrade, -1

机译：具有多核CPUGPU和MIC的系统上的应用程序性能分析和高效执行：以显微镜图像分析为例
7. Kernel Assisted Collective Intra-node MPI Communication Among Multi-core and Many-core CPUs [O] . Ma, Teng, Bosilca, George, Bouteiller, Aurélien, 2011

机译：多核和多核CPU之间的内核辅助集体节点内MPI通信

Automated Transformation of GPU-Specific OpenCL Kernels Targeting Performance Portability on Multi-Core/Many-Core CPUs

摘要

著录项

相似文献

相关主题

期刊订阅