首页> 外文会议>International Euro-Par Conference >Automated Transformation of GPU-Specific OpenCL Kernels Targeting Performance Portability on Multi-Core/Many-Core CPUs

【24h】

Automated Transformation of GPU-Specific OpenCL Kernels Targeting Performance Portability on Multi-Core/Many-Core CPUs

机译：GPU特定OpenCL内核的自动转换针对多核/许多核心CPU上的性能便携性

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

When adapting GPU-specific OpenCL kernels to run on multi-core/many-core CPUs, coarsening the thread granularity is necessary and thus extensively used. However, locality concerns exposed in GPU-specific OpenCL code are usually inherited without analysis, which may give side-effects on the CPU performance. When executing GPU-specific kernels on CPUs, local-memory arrays no longer match well with the hardware and the associated synchronizations are costly. To solve this dilemma, we actively analyze the memory access patterns by using array-access descriptors derived from GPU-specific kernels, which can thus be adapted for CPUs by removing all the unwanted local-memory arrays together with the obsolete barrier statements. Experiments show that the automated transformation can satisfactorily improve OpenCL kernel performances on Sandy Bridge CPU and Intel's Many-Integrated-Core coprocessor.

机译：当调整GPU特定的OpenCL内核以在多核/多核CPU上运行时，需要粗略螺纹粒度并因此广泛使用。然而，在GPU特定的OpenCL代码中暴露的位置涉及通常在没有分析的情况下继承，这可能会对CPU性能进行副作用。在CPU上执行GPU特定内核时，本地存储器阵列与硬件不再匹配，关联的同步昂贵。为了解决这种困境，我们通过使用从GPU特定内核导出的数组访问描述符来激发内存访问模式，这可以通过将所有不需要的本地存储器阵列与过时的屏障语句一起移除所有不需要的本地存储器阵列来调整CPU。实验表明，自动化转换可以令人满意地改善砂岩CPU和英特尔的许多综合核心协处理器上的OpenCL内核性能。

著录项

来源
《International Euro-Par Conference 》|2014年||共12页
会议地点
作者
Dafei Huang; Mei Wen; Changqing Xun; Dong Chen; Xing Cai; Yuran Qiao; Nan Wu; Chunyuan Zhang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.133.2-53;
关键词
OpenCL; Performance portability; Multi-core/many-core CPU; Code transformation and optimization;

机译：OpenCL;性能可移植性;多核/许多核心CPU;代码转换和优化;

相似文献

外文文献
中文文献
专利

1. Improving performance portability for GPU-specific OpenCL kernels on multi-core/many-core CPUs by analysis-based transformations [J] . Mei?Wen, Da-fei?Huang, Chang-qing?Xun, Frontiers of Information Technology & Electronic Engineering . 2015 ,第11期

机译：通过基于分析的转换，提高多核/多核CPU上特定于GPU的OpenCL内核的性能可移植性
2. Improving performance portability for GPU-specific OpenCL kernels on multi-core/many-core CPUs by analysis-based transformations*# [J] . Mei WEN, Da-fei HUANG, Chang-qing XUN, 浙江大学学报（英文版）（C辑：计算机与电子） . 2015 ,第011期

机译：通过基于分析的转换来提高多核/多核CPU上特定于GPU的OpenCL内核的性能可移植性*＃
3. Assessing the Performance and Energy Usage of Multi-CPUs, Multi-Core and Many-Core Systems : The MMP Image Encoder Case Study [J] . Pedro M.M. Pereira, Patricio Domingues, Nuno M. M. Rodrigues, International Journal of Distributed and Parallel Systems . 2016 ,第5期

机译：评估多CPU，多核和多核系统的性能和能耗：MMP图像编码器案例研究
4. Automated Transformation of GPU-Specific OpenCL Kernels Targeting Performance Portability on Multi-Core/Many-Core CPUs [C] . Dafei Huang, Mei Wen, Changqing Xun, International conference on Euro-Par . 2014

机译：针对多核/多核CPU的性能可移植性的GPU专用OpenCL内核的自动转换
5. Parallelization framework for scientific application kernels on multi-core/many-core platforms. [D] . Peng, Liu. 2011

机译：多核/多核平台上科学应用程序内核的并行化框架。
6. Application Performance Analysis and Efficient Execution on Systems with multi-core CPUs GPUs and MICs: A Case Study with Microscopy Image Analysis [O] . George Teodoro, Tahsin Kurc, Guilherme Andrade, -1

机译：具有多核CPUGPU和MIC的系统上的应用程序性能分析和高效执行：以显微镜图像分析为例
7. Kernel Assisted Collective Intra-node MPI Communication Among Multi-core and Many-core CPUs [O] . Ma, Teng, Bosilca, George, Bouteiller, Aurélien, 2011

机译：多核和多核CPU之间的内核辅助集体节点内MPI通信

Automated Transformation of GPU-Specific OpenCL Kernels Targeting Performance Portability on Multi-Core/Many-Core CPUs

摘要

著录项

相似文献

相关主题

期刊订阅