首页> 外文期刊>Parallel Computing >An application-centric evaluation of OpenCL on multi-core CPUs
【24h】

An application-centric evaluation of OpenCL on multi-core CPUs

机译:在多核CPU上以应用程序为中心的OpenCL评估

获取原文
获取原文并翻译 | 示例

摘要

Although designed as a cross-platform parallel programming model, OpenCL remains mainly used for GPU programming. Nevertheless, a large amount of applications are parallelized, implemented, and eventually optimized in OpenCL. Thus, in this paper, we focus on the potential that these parallel applications have to exploit the performance of multi-core CPUs. Specifically, we analyze the method to systematically reuse and adapt the OpenCL code from GPUs to CPUs. We claim that this work is a necessary step for enabling inter-platform performance portability in OpenCL. Our method is based on iterative tuning: given an application, we choose a reasonable OpenMP implementation as a performance reference and we systematically tune the OpenCL code to reach or exceed this threshold. In the process, we identify the factors that significantly impact the performance of the OpenCL code. We apply this method for five different applications, selected from the Rodinia benchmark suite (which provides equiva lent OpenMP and OpenCL implementations), and make a series of thorough evaluations with different datasets on three different multi-core platforms. We find that the OpenCL performance on CPUs is affected by typical, hard-coded GPU optimizations (unsuitable for multi-core CPUs), by the fine-grained parallelism of the model, and by the immature OpenCL compilers. Systematically fixing these issues allowed OpenCL to achieve OpenMP's or better performance, proving it can be a good option for programming multi-core CPUs.
机译:尽管被设计为跨平台并行编程模型,但OpenCL仍然主要用于GPU编程。尽管如此,仍在OpenCL中并行化,实现并最终优化了大量应用程序。因此,在本文中,我们将重点放在这些并行应用程序必须利用多核CPU性能的潜力上。具体来说,我们分析了系统地重用OpenCL代码并将其从GPU适配到CPU的方法。我们声称这项工作是在OpenCL中实现跨平台性能可移植性的必要步骤。我们的方法基于迭代调整:给定应用程序,我们选择合理的OpenMP实现作为性能参考,并系统地调整OpenCL代码以达到或超过此阈值。在此过程中,我们确定了会严重影响OpenCL代码性能的因素。我们将此方法应用于从Rodinia基准套件(提供等效的OpenMP和OpenCL实现)中选择的五个不同的应用程序,并对三个不同的多核平台上的不同数据集进行了一系列全面的评估。我们发现,CPU上的OpenCL性能受到典型的硬编码GPU优化(不适用于多核CPU),模型的细粒度并行性以及不成熟的OpenCL编译器的影响。通过系统地解决这些问题,OpenCL可以实现OpenMP或更高的性能,这证明它是编程多核CPU的不错选择。

著录项

  • 来源
    《Parallel Computing》 |2013年第12期|834-850|共17页
  • 作者单位

    Parallel and Distributed Systems Croup, Delft University of Technology, The Netherlands;

    Parallel and Distributed Systems Croup, Delft University of Technology, The Netherlands;

    Parallel and Distributed Systems Croup, Delft University of Technology, The Netherlands;

    Parallel and Distributed Systems Croup, Delft University of Technology, The Netherlands,Informatics Institute, University of Amsterdam, The Netherlands;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    OpenCL; Multi-core CPUs; Performance evaluation; Performance tuning; OpenMP;

    机译:OpenCL;多核CPU;绩效评估;性能调优;OpenMP的;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号