Evaluation of Successive CPUs/APUs/GPUs Based on an OpenCL Finite Difference Stencil

机译：基于OpenCL有限差分模板的连续CPU / APU / GPU的评估

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The AMD APU (Accelerated Processing Unit) architecture, which combines CPU and GPU cores on the same die, is promising for GPU applications which performance is bottlenecked by the low PCI Express communication rate. However the first APU generations still have different CPU and GPU memory partitions. Currently, the APU integrated GPUs are also less powerful than discrete GPUs. In this paper we therefore investigate the interest of APUs for scientific computing by evaluating and comparing the performance of two successive AMD APUs (family codename Llano and Trinity), two successive discrete GPUs (chip codename Cayman and Tahiti) and one hexa-core AMD CPU. For this purpose, we rely on a 3D finite difference stencil, that is optimized and tuned in OpenCL. We detail the most interesting optimizations for each architecture and show very good performance in OpenCL: up to 500 Gflops on Tahiti. Finally, our results show that APU integrated GPUs outperform CPUs, and that integrated GPUs of upcoming APUs may match discrete GPUs for problems with high communication requirements.

机译：AMD APU（加速处理单元）架构在同一芯片上结合了CPU和GPU内核，因此对于低PCI Express通信速率造成性能瓶颈的GPU应用而言，是有希望的。但是，第一代APU仍然具有不同的CPU和GPU内存分区。当前，集成APU的GPU还不如离散GPU强大。因此，在本文中，我们通过评估和比较两个连续的AMD APU（系列代号Llano和Trinity），两个连续的离散GPU（芯片代号Cayman和Tahiti）以及一个六核AMD CPU的性能来研究APU在科学计算中的兴趣。。为此，我们依赖于3D有限差分模板，该模板在OpenCL中进行了优化和调整。我们详细介绍了每种体系结构最有趣的优化，并在OpenCL中显示了非常好的性能：在塔希提岛上高达500 Gflops。最后，我们的结果表明，APU集成GPU的性能优于CPU，而即将推出的APU的集成GPU可能会与离散GPU相匹配，以解决通信需求较高的问题。

著录项

来源
《Euromicro International Conference on Parallel, Distributed and Network-Based Processing》|2013年|p.405-409|共5页
会议地点
作者
Calandra Henri; Dolbeau Romain; Fortin Pierre; Lamotte Jean-Luc; Said Issam;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类数据处理、数据处理系统;数据处理、数据处理系统;
关键词
APU; GPU; PCI Express bus; finite difference stencil; high performance scientific computing;

机译：APU; GPU; PCI Express总线;有限差分模板;高性能科学计算;

相似文献

外文文献
中文文献
专利

1. Implementing and Evaluating an Heterogeneous, Scalable, Tridiagonal Linear System Solver with OpenCL to Target FPGAs, GPUs, and CPUs [J] . Macintosh Hamish J., Banks Jasmine E., Kelson Neil A. International journal of reconfigurable computing . 2019,第PTa1期

机译：实现和评估具有OpenCL的异构，可伸缩的Tridgonal线性系统求解器，以靶向FPGA，GPU和CPU
2. Implementing and Evaluating an Heterogeneous, Scalable, Tridiagonal Linear System Solver with OpenCL to Target FPGAs, GPUs, and CPUs [J] . Hamish J. Macintosh, Jasmine E. Banks, Neil A. Kelson International journal of reconfigurable computing . 2019,第5aaPagea2期

机译：实现和评估具有OpenCL的异构，可伸缩的Tridgonal线性系统求解器，以靶向FPGA，GPU和CPU
3. Evaluating the Efficiency of CPUs, GPUs and FPGAs on a Near-Duplicate Document Detection Via OpenCL [J] . Ercan Canhasi Journal of computer sciences . 2018,第5期

机译：通过OpenCL在几乎重复的文档检测中评估CPU，GPU和FPGA的效率
4. Evaluation of Successive CPUs/APUs/GPUs Based on an OpenCL Finite Difference Stencil [C] . Calandra Henri, Dolbeau Romain, Fortin Pierre, Euromicro International Conference on Parallel, Distributed and Network-Based Processing . 2013

机译：基于OpenCL有限差异模板的连续CPU / APU / GPU评估
5. Numerical Accuracy Differences in CPU and GPGPU Codes. [D] . Yablonski, Devon. 2011

机译：CPU和GPGPU代码的数值精度差异。
6. BLAMM: BLAS-based algorithm for finding position weight matrix occurrences in DNA sequences on CPUs and GPUs [O] . Jan Fostier 2020

机译：BLAMM：基于BLAS的算法用于查找CPU和GPU上DNA序列中的位置权重矩阵
7. INVITED PAPER: USING OPENCL TO EVALUATE THE EFFICIENCY OF CPUS, GPUS AND FPGAS FOR INFORMATION FILTERING [O] . Doris Chen 2013

机译：被邀请论文：使用OPENCL评估信息过滤中CPU，GPU和FPGA的效率
8. Block-Iterative Methods for 3D Constant- Coefficient Stencils on GPUs and Multicore CPUs. [R] . Rodriguez, M., Philip, B., Wang, Z., 2014

机译：GpU和多核CpU上3D恒定系数模板的块迭代方法。

Evaluation of Successive CPUs/APUs/GPUs Based on an OpenCL Finite Difference Stencil

摘要

著录项

相似文献

相关主题

期刊订阅