...
首页> 外文期刊>International Journal of Intelligent Systems and Applications >Accelerating Training of Deep Neural Networks on GPU using CUDA
【24h】

Accelerating Training of Deep Neural Networks on GPU using CUDA

机译:使用CUDA加速GPU上的深度神经网络训练

获取原文

摘要

The development of fast and efficient training algorithms for Deep Neural Networks has been a subject of interest over the past few years because the biggest drawback of Deep Neural Networks is enormous cost in computation and large time is consumed to train the parameters of Deep Neural Networks. This aspect motivated several researchers to focus on recent advancements of hardware architectures and parallel programming models and paradigms for accelerating the training of Deep Neural Networks. We revisited the concepts and mechanisms of typical Deep Neural Network training algorithms such as Backpropagation Algorithm and Boltzmann Machine Algorithm and observed that the matrix multiplication constitutes major portion of the work-load for the Deep Neural Network training process because it is carried out for a huge number of times during the training of Deep Neural Networks. With the advent of many-core GPU technologies, a matrix multiplication can be done very efficiently in parallel and this helps a lot training a Deep Neural Network not consuming time as it used to be a few years ago. CUDA is one of the high performance parallel programming models to exploit the capabilities of modern many-core GPU systems. In this paper, we propose to modify Backpropagation Algorithm and Boltzmann Machine Algorithm with CUDA parallel matrix multiplication and test on many-core GPU system. Finally we discover that the planned strategies achieve very quick training of Deep Neural Networks than classic strategies.
机译:在过去的几年中,针对深度神经网络的快速有效训练算法的开发一直是人们关注的主题,因为深度神经网络的最大缺点是计算成本巨大,并且花费大量时间来训练深度神经网络的参数。这方面促使一些研究人员将注意力集中在硬件体系结构和并行编程模型及范例的最新进展上,以加速对深度神经网络的训练。我们重新审视了典型的深度神经网络训练算法(例如,反向传播算法和Boltzmann机器算法)的概念和机制,并观察到矩阵乘法构成了深度神经网络训练过程工作量的主要部分,因为它是为执行大量任务而进行的。深度神经网络训练期间的次数。随着多核GPU技术的出现,矩阵乘法可以非常高效地并行完成,这有助于大量训练深度神经网络,而无需像几年前那样花费时间。 CUDA是利用现代多核GPU系统功能的高性能并行编程模型之一。本文提出用CUDA并行矩阵乘法修改Backpropagation算法和Boltzmann机器算法,并在多核GPU系统上进行测试。最终,我们发现,与经典策略相比,计划策略可对深度神经网络进行非常快速的训练。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号