首页> 外文期刊>Neurocomputing >Heterogeneous blocked CPU-GPU accelerate scheme for large scale extreme learning machine
【24h】

Heterogeneous blocked CPU-GPU accelerate scheme for large scale extreme learning machine

机译:大规模极限学习机的异构阻塞CPU-GPU加速方案

获取原文
获取原文并翻译 | 示例

摘要

Extreme learning machine (ELM) has been intensively studied during the last decade due to its high efficiency, effectiveness and easy to implement. Recently, a variant of ELM named local receptive fields based ELM (ELM-LRF) has been proposed to reduce the global connections and introduce local receptive fields to the input layer. However, an ELM-LRF model with large number of hidden neurons spend plenty of time on solving large scale Moore-Penrose Matrix Inversion (MPMI) problem which has heavy computational cost and needs much more runtime memory. Moreover, this procedure can not be directly accelerated by GPU platforms due to the limited memory of GPU devices. In this paper, we propose three efficient approaches to perform ELM-LRF on GPU platform. First we propose a novel blocked LU decomposition algorithm, which overcomes the limitation of global memory size so that any size of ELM-LRF models can be trained. Furthermore, an efficient blocked Cholesky decomposition algorithm is presented to accelerate blocked LU decomposition algorithm according to matrix characteristics in the ELM-LRF model. Finally we present a heterogeneous blocked CPU-GPU parallel algorithm to fully exploit resources on a GPU node such as to accelerate blocked Cholesky decomposition algorithm furthermore in the ELM-LRF model. (C) 2017 Elsevier B.V. All rights reserved.
机译:极限学习机(ELM)由于其高效,高效且易于实施而在过去十年中进行了深入研究。最近,已经提出了一种ELM的变体,称为基于ELM的局部感受野(ELM-LRF),以减少全局连接并将局部感受野引入输入层。然而,具有大量隐藏神经元的ELM-LRF模型花费大量时间来解决大规模的摩尔-彭罗斯矩阵反转(MPMI)问题,该问题具有沉重的计算成本,并且需要更多的运行时内存。此外,由于GPU设备的内存有限,此过程无法由GPU平台直接加速。在本文中,我们提出了三种在GPU平台上执行ELM-LRF的有效方法。首先,我们提出了一种新颖的分块LU分解算法,该算法克服了全局内存大小的限制,因此可以训练任意大小的ELM-LRF模型。此外,根据ELM-LRF模型中的矩阵特征,提出了一种有效的块状Cholesky分解算法来加速块状LU分解算法。最后,我们提出一种异构的阻塞式CPU-GPU并行算法,以充分利用GPU节点上的资源,从而进一步加速ELM-LRF模型中的阻塞式Cholesky分解算法。 (C)2017 Elsevier B.V.保留所有权利。

著录项

  • 来源
    《Neurocomputing》 |2017年第25期|153-163|共11页
  • 作者单位

    Natl Univ Def Technol, Sci & Technol Parallel & Distributed Proc Lab, Sch Comp, Changsha, Hunan, Peoples R China;

    Natl Univ Def Technol, Sci & Technol Parallel & Distributed Proc Lab, Sch Comp, Changsha, Hunan, Peoples R China;

    Natl Univ Def Technol, Sci & Technol Parallel & Distributed Proc Lab, Sch Comp, Changsha, Hunan, Peoples R China;

    Natl Univ Def Technol, Sci & Technol Parallel & Distributed Proc Lab, Sch Comp, Changsha, Hunan, Peoples R China;

    Natl Univ Def Technol, Sci & Technol Parallel & Distributed Proc Lab, Sch Comp, Changsha, Hunan, Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    ELM-LRF; GPU; Blocked CPU-GPU accelerate algorithm;

    机译:ELM-LRF;GPU;阻塞的CPU-GPU加速算法;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号