首页> 外文期刊>Future generation computer systems >Optimizing convolution operations on GPUs using adaptive tiling
【24h】

Optimizing convolution operations on GPUs using adaptive tiling

机译:使用自适应平铺优化GPU上的卷积运算

获取原文
获取原文并翻译 | 示例
           

摘要

The research domain of Multimedia Content Analysis (MMCA) considers all aspects of the automated extraction of knowledge from multimedia data. High-performance computing techniques are necessary to satisfy the ever increasing computational demands of MMCA applications. The introduction of Graphics Processing Units (GPUs) in modern cluster systems presents application developers with a challenge. While GPUs are well known to be capable of providing significant performance improvements, the programming complexity vastly increases. To this end, we have extended a user transparent parallel programming model for MMCA, named Parallel-Horus, to allow the execution of compute intensive operations on the GPUs present in the cluster. The most important class of operations in the MMCA domain are convolutions, which are typically responsible for a large fraction of the execution time. Existing optimization approaches for CUDA kernels in general as well as those specific to convolution operations are too limited in both performance and flexibility. In this paper, we present a new optimization approach, called adaptive tiling, to implement a highly efficient, yet flexible, library-based convolution operation for modern GPUs. To the best of our knowledge, our implementation is the most optimized and best performing implementation of 2D convolution in the spatial domain available to date.
机译:多媒体内容分析(MMCA)的研究领域考虑了从多媒体数据中自动提取知识的所有方面。高性能计算技术对于满足MMCA应用不断增长的计算需求是必不可少的。在现代集群系统中引入图形处理单元(GPU)给应用程序开发人员带来了挑战。尽管众所周知GPU能够提供显着的性能改进,但编程复杂性却大大增加。为此,我们为MMCA扩展了一个用户透明的并行编程模型,名为Parallel-Horus,以允许在集群中存在的GPU上执行计算密集型操作。 MMCA域中最重要的操作类别是卷积,卷积通常占执行时间的很大一部分。通常,CUDA内核的现有优化方法以及卷积运算专用的优化方法在性能和灵活性上都受到限制。在本文中,我们提出了一种新的优化方法,称为自适应平铺,可为现代GPU实现高效,灵活,基于库的卷积运算。据我们所知,我们的实现是迄今为止可用的空间域中最优化,性能最佳的2D卷积实现。

著录项

  • 来源
    《Future generation computer systems》 |2014年第1期|14-26|共13页
  • 作者单位

    Department of Computer Science, VU University Amsterdam, De Boelelaan 1081, 1081 HV Amsterdam, The Netherlands;

    Department of Computer Science, VU University Amsterdam, De Boelelaan 1081, 1081 HV Amsterdam, The Netherlands,Netherlands eScience Center, Science Park 140, 1098 XG Amsterdam, The Netherlands;

    Department of Computer Science, VU University Amsterdam, De Boelelaan 1081, 1081 HV Amsterdam, The Netherlands;

    Department of Computer Science, VU University Amsterdam, De Boelelaan 1081, 1081 HV Amsterdam, The Netherlands,Netherlands eScience Center, Science Park 140, 1098 XG Amsterdam, The Netherlands;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    High-performance computing; GPU computing; Parallel applications; GPU clusters; High-level programming models;

    机译:高性能计算;GPU计算;并行应用;GPU集群;高级编程模型;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号