首页> 外文会议>International Conference on Engineering MIS >Execution-time optimization based on thread and block repartitions on a graphic processing unit
【24h】

Execution-time optimization based on thread and block repartitions on a graphic processing unit

机译:基于图形处理单元上的线程和块分区的执行时间优化

获取原文

摘要

With the rapid development of multimedia technologies and network communication, the parallel architecture such as the Graphic Processing Unit (GPU) is introduced in high-performance computing. But, how to program this GPU and how to obtain the best execution time remains usually an art. In this paper, a search study is performed on the Thread and the Block number that leads to a Prediction Unit of 64×64 (PU64) computation in the High Efficiency Video Coding (HEVC). It is proposed through the Compute Unified Device Architecture (CUDA). This method is described to optimize the GPU execution time. Experimental results show that the best Grid topology chosen to run the GPU kernel is obtained for 128 Block and 32 Thread. This proposed repartition gives the minimum GPU execution time compared to the CPU one, where the speed-up obtained here is around 50%.
机译:随着多媒体技术和网络通信的飞速发展,高性能计算中引入了并行架构,例如图形处理单元(GPU)。但是,如何对该GPU进行编程以及如何获得最佳执行时间通常仍然是一门艺术。在本文中,对线程和块号进行了搜索研究,从而得出了高效视频编码(HEVC)中的64×64(PU64)预测单位。它是通过计算统一设备体系结构(CUDA)提出的。描述了此方法以优化GPU执行时间。实验结果表明,对于128块和32线程,选择了运行GPU内核的最佳Grid拓扑。与CPU相比,这种建议的重新分配方式可提供最少的GPU执行时间,在这里CPU的加速速度约为50%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号