Heterogeneous blocked CPU-GPU accelerate scheme for large scale extreme learning machine

Li Shijie; Niu Xin; Dou Yong; Lv Qi; Wang Yueqing

首页> 外文期刊>Neurocomputing >Heterogeneous blocked CPU-GPU accelerate scheme for large scale extreme learning machine

【24h】

Heterogeneous blocked CPU-GPU accelerate scheme for large scale extreme learning machine

机译：大规模极限学习机的异构阻塞CPU-GPU加速方案

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Extreme learning machine (ELM) has been intensively studied during the last decade due to its high efficiency, effectiveness and easy to implement. Recently, a variant of ELM named local receptive fields based ELM (ELM-LRF) has been proposed to reduce the global connections and introduce local receptive fields to the input layer. However, an ELM-LRF model with large number of hidden neurons spend plenty of time on solving large scale Moore-Penrose Matrix Inversion (MPMI) problem which has heavy computational cost and needs much more runtime memory. Moreover, this procedure can not be directly accelerated by GPU platforms due to the limited memory of GPU devices. In this paper, we propose three efficient approaches to perform ELM-LRF on GPU platform. First we propose a novel blocked LU decomposition algorithm, which overcomes the limitation of global memory size so that any size of ELM-LRF models can be trained. Furthermore, an efficient blocked Cholesky decomposition algorithm is presented to accelerate blocked LU decomposition algorithm according to matrix characteristics in the ELM-LRF model. Finally we present a heterogeneous blocked CPU-GPU parallel algorithm to fully exploit resources on a GPU node such as to accelerate blocked Cholesky decomposition algorithm furthermore in the ELM-LRF model. (C) 2017 Elsevier B.V. All rights reserved.

机译：极限学习机（ELM）由于其高效，高效且易于实施而在过去十年中进行了深入研究。最近，已经提出了一种ELM的变体，称为基于ELM的局部感受野（ELM-LRF），以减少全局连接并将局部感受野引入输入层。然而，具有大量隐藏神经元的ELM-LRF模型花费大量时间来解决大规模的摩尔-彭罗斯矩阵反转（MPMI）问题，该问题具有沉重的计算成本，并且需要更多的运行时内存。此外，由于GPU设备的内存有限，此过程无法由GPU平台直接加速。在本文中，我们提出了三种在GPU平台上执行ELM-LRF的有效方法。首先，我们提出了一种新颖的分块LU分解算法，该算法克服了全局内存大小的限制，因此可以训练任意大小的ELM-LRF模型。此外，根据ELM-LRF模型中的矩阵特征，提出了一种有效的块状Cholesky分解算法来加速块状LU分解算法。最后，我们提出一种异构的阻塞式CPU-GPU并行算法，以充分利用GPU节点上的资源，从而进一步加速ELM-LRF模型中的阻塞式Cholesky分解算法。（C）2017 Elsevier B.V.保留所有权利。

著录项

来源
《Neurocomputing》 |2017年第25期|153-163|共11页
作者
Li Shijie; Niu Xin; Dou Yong; Lv Qi; Wang Yueqing;
展开▼
作者单位

Natl Univ Def Technol, Sci & Technol Parallel & Distributed Proc Lab, Sch Comp, Changsha, Hunan, Peoples R China;

Natl Univ Def Technol, Sci & Technol Parallel & Distributed Proc Lab, Sch Comp, Changsha, Hunan, Peoples R China;

Natl Univ Def Technol, Sci & Technol Parallel & Distributed Proc Lab, Sch Comp, Changsha, Hunan, Peoples R China;

Natl Univ Def Technol, Sci & Technol Parallel & Distributed Proc Lab, Sch Comp, Changsha, Hunan, Peoples R China;

Natl Univ Def Technol, Sci & Technol Parallel & Distributed Proc Lab, Sch Comp, Changsha, Hunan, Peoples R China;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
ELM-LRF; GPU; Blocked CPU-GPU accelerate algorithm;

机译：ELM-LRF;GPU;阻塞的CPU-GPU加速算法;

相似文献

外文文献
中文文献
专利

1. An accelerating scheme for destructive parsimonious extreme learning machine [J] . Zhao Yong-Ping, Li Bing, Li Ye-Bo Neurocomputing . 2015,第nova1期

机译：破坏性简约极端学习机的加速方案
2. Distributed semi-supervised learning algorithm based on extreme learning machine over networks using event-triggered communication scheme [J] . Xie Jin, Liu Sanyang, Dai Hao Neural Networks: The Official Journal of the International Neural Network Society . 2019,第期

机译：使用事件触发通信方案的基于极端学习机的分布式半监督学习算法
3. Extreme learning machine for estimating blocking probability of bufferless OBS/OPS networks [J] . Ho Chun Leung, Chi Sing Leung, Eric W.M. Wong, Optical Communications and Networking, IEEE/OSA Journal of . 2017,第8期

机译：用于估计无缓冲OBS / OPS网络阻塞概率的极限学习机
4. A scalable and portable approach to accelerate hybrid HPL on heterogeneous CPU-GPU clusters [C] . Shi Rong, Potluri Sreeram, Hamidouche Khaled, IEEE International Conference on Cluster Computing . 2013

机译：一种可扩展的便携式方法，可在异构CPU-GPU集群上加速混合HPL
5. Accelerating the discontinuous Galerkin cell-vertex scheme (DG-CVS) solver on CPU-GPU heterogeneous systems. [D] . Hu, Xiaoqi. 2017

机译：在CPU-GPU异构系统上加速不连续Galerkin单元顶点方案（DG-CVS）求解器。
6. SGB-ELM: An Advanced Stochastic Gradient Boosting-Based Ensemble Scheme for Extreme Learning Machine [O] . Hua Guo, Jikui Wang, Wei Ao, 2018

机译：SGB-ELM：用于极限学习机的高级基于随机梯度提升的集成方案
7. Extreme learning machine: A new learning scheme of feedforward neural networks [O] . Guang-bin Huang, Qin-yu Zhu, Chee-kheong Siew 2006

机译：极限学习机：一种新的前馈神经网络学习方案

Heterogeneous blocked CPU-GPU accelerate scheme for large scale extreme learning machine

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅