首页> 外文会议>International Conference on Parallel Architectures and Compilation Techniques >Automatic OpenCL work-group size selection for multicore CPUs

【24h】

Automatic OpenCL work-group size selection for multicore CPUs

机译：多核CPU的自动OpenCL工作组大小选择

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we address the effect of the work-group size on the performance of OpenCL kernels. We propose a profiling-based algorithm that finds a good work-group size, in terms of performance, for the target multicore CPU architecture. Our algorithm reduces misses in the private L1 data cache and achieves load balancing between cores. It exploits the polyhedral model to estimate the working-set size and the number of cache misses for a parameterized work-group size of the OpenCL kernel. Based on the profiling information, it heuristically searches the space of parameterized work-group sizes. Our virtuallyext-ended index space helps to increase the probability to find a better work-group size. We implement our work-group size selection algorithm as a development tool that consists of a code generator and a search library. The code generator extracts the polytope of each memory reference from the kernel code and generates a function that simplifies polytopes using the run-time information and invokes search library routines. The search library calculates the working-set size using the polytopes and finds a proper work-group size. We evaluate our approach using 31 OpenCL kernels on four different multicore CPUs. We compare its accuracy and search time to those of an exhaustive search method. Experimental results show that our tool is, on average, 1566 times faster than the exhaustive search and selects a work-group size whose performance is the same as or comparable to that of the exhaustive search.

机译：在本文中，我们讨论了工作组大小对OpenCL内核性能的影响。我们提出了一种基于性能分析的算法，该算法在性能方面为目标多核CPU体系结构找到了一个不错的工作组大小。我们的算法减少了专用L1数据缓存中的遗漏，并实现了内核之间的负载平衡。它利用多面模型估算OpenCL内核的参数化工作组大小的工作集大小和高速缓存未命中数。基于概要分析信息，它启发式搜索参数化工作组大小的空间。我们几乎扩展的索引空间有助于增加找到更好的工作组规模的可能性。我们将工作组大小选择算法实现为包含代码生成器和搜索库的开发工具。代码生成器从内核代码中提取每个内存引用的多面体，并生成一个使用运行时信息简化多面体并调用搜索库例程的函数。搜索库使用多面体计算工作集大小，并找到合适的工作组大小。我们在四个不同的多核CPU上使用31个OpenCL内核来评估我们的方法。我们将其准确性和搜索时间与穷举搜索方法的准确性和搜索时间进行比较。实验结果表明，我们的工具平均比穷举搜索快1566倍，并选择了性能与穷举搜索相同或相当的工作组。

著录项

来源
《International Conference on Parallel Architectures and Compilation Techniques 》|2013年|387-397|共11页
会议地点
作者
Seo Sangmin; Lee Jun; Jo Gangwon; Lee Jaejin;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
OpenCL; automatic selection; multicore CPU; performance portability; profiling; work-group size; working-set;

机译：OpenCL;自动选择;多核CPU;性能可移植性;性能分析;工作组大小;工作集;

相似文献

外文文献
中文文献
专利

1. OpenCL Performance Evaluation on Modern Multicore CPUs [J] . Joo HwanLee, NimitNigania, HyesoonKim, Scientific programming . 2015 ,第4期

机译：现代多核CPU上的OpenCL性能评估
2. OpenCL Performance Evaluation on Modern Multicore CPUs [J] . Lee Joo Hwan, Nigania Nimit, Kim Hyesoon, Scientific programming . 2015 ,第期

机译：现代多核CPU上的OpenCL性能评估
3. Efficient Performance Estimation and Work-Group Size Pruning for OpenCL Kernels on GPUs [J] . IEEE Transactions on Parallel and Distributed Systems . 2020 ,第5期

机译：GPU上的OpenCL内核的高效性能估计和工作组大小修剪
4. Automatic OpenCL Work-Group Size Selection for Multicore CPUs [C] . Sangmin Seo, Jun Lee, Gangwon Jo, International Conference on Parallel Architectures and Compilation Techniques . 2013

机译：自动OpenCL工作组大小选择多核CPU
5. Transitioning to the North American standard gill net: Size selectivity corrections and the effects of net design on CPUE, size structure, and site selection. [D] . Ryswyk, Ryan G. 2013

机译：过渡到北美标准刺网：尺寸选择性校正以及网设计对CPUE，尺寸结构和选址的影响。
6. Size‐based protocol optimization using automatic tube current modulation and automatic kV selection in computed tomography [O] . Robert D. MacDougall, Patricia L. Kleinman, Michael J. Callahan 2016

机译：在计算机断层摄影中使用自动管电流调制和自动kV选择的基于大小的协议优化
7. Automatic Step Size Selection in Random Walk Metropolis Algorithms [O] . Graves, Todd L. 2011

机译：随机游走大都市算法中的自动步长选择

Automatic OpenCL work-group size selection for multicore CPUs

摘要

著录项

相似文献

相关主题

期刊订阅