首页> 外文会议>International Conference on Information Science, Electronics and Electrical Engineering >Dynamic memory optimization and parallelism management for OpenCL
【24h】

Dynamic memory optimization and parallelism management for OpenCL

机译:OpenCL的动态内存优化和并行管理

获取原文

摘要

Recently, multiprocessor platforms have become trends for achieving high performance. OpenCL (Open Computing Language) is one of the programming standards for heterogeneous multiprocessors, and provides portability for these platforms. Our research focuses on platforms with CPUs and GPUs since GPUs are now widespread in use. On such a platform, two programming issues may affect the performance on GPU computing significantly. One is the work load distribution and another is the employment of GPU memory hierarchy. To fully utilize the characteristics of GPUs, programmers have to be not only proficient at parallel programming but also familiar with hardware specifications. Therefore, in this paper, we propose a compilation pass to automatically perform optimizations for OpenCL kernels. Our compilation pass will transform an input naïve kernel function with optimizations, including kernel function analysis, work-group rearrangement, memory coalescing, and work-item merge. In addition, our framework is implemented on a runtime system so that it may dynamically adjust the optimizing parameters according to the hardware specifications. Considering the execution time, the optimized kernels generated by our design may have significant performance improvement over the naïve versions. Although the optimizations performed in runtime may incur time overheads, the overheads may be covered by intensive kernel computation or massive input data in most cases.
机译:近来,多处理器平台已成为实现高性能的趋势。 OpenCL(开放计算语言)是异构多处理器的编程标准之一,并为这些平台提供了可移植性。我们的研究集中在具有CPU和GPU的平台上,因为GPU现在已被广泛使用。在这样的平台上,两个编程问题可能会严重影响GPU计算的性能。一个是工作负载分配,另一个是采用GPU内存层次结构。为了充分利用GPU的特性,程序员不仅必须精通并行编程,还必须熟悉硬件规范。因此,在本文中,我们提出了一个编译通道来自动执行OpenCL内核的优化。我们的编译过程将通过优化(包括内核函数分析,工作组重排,内存合并和工作项合并)来优化输入的原始内核函数。此外,我们的框架是在运行时系统上实现的,因此它可以根据硬件规格动态调整优化参数。考虑到执行时间,我们的设计生成的优化内核可能会比纯朴的版本具有显着的性能提升。尽管在运行时执行优化可能会产生时间开销,但是在大多数情况下,开销可能会被密集的内核计算或大量的输入数据所覆盖。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号