Exploiting randomness in sketching for efficient hardware implementation of machine learning applications

机译：利用素描中的随机性来实现机器学习应用程序的有效硬件实现

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Energy-efficient processing of large matrices for big-data applications using hardware acceleration is an intense area of research. Sketching of large matrices into their lower-dimensional representations is an effective strategy. For the first time, this paper develops a highly energy-efficient hardware implementation of a class of sketching methods based on random projections, known as Johnson Lindenstrauss (JL) transform. Crucially, we show how the randomness inherent in the projection matrix can be exploited to create highly efficient fixed-point arithmetic realizations of several machine-learning applications. Specifically, the transform's random matrices have two key properties that allow for significant energy gains in hardware implementations. The first is the smoothing property that allows us to drastically reduce operand bit-width in computation of the JL transform itself. The second is the randomizing property that allows bit-width reduction in subsequent machine-learning applications. Further, we identify a random matrix construction method that exploits the special sparsity structure to result in the most hardware-efficient realization and implement the highly optimized transform on an FPGA. Experimental results on (1) the k-nearest neighbor (KNN) classification and (2) the principal component analysis (PCA) show that with the same bit-width the proposed flow utilizing random projection achieves an up to 7× improvement in both latency and energy. Furthermore, by exploiting the smoothing and randomizing properties we are able to use a 1-bit instead of a 4-bit multiplier within KNN, which results in additional 50% and 6% improvement in area and energy respectively. The proposed I/O streaming strategy along with the hardware-efficient JL algorithm identified by us is able to achieve a 50% runtime reduction, a 17% area reduction in the stage of random projection compared to a standard design.

机译：使用硬件加速为大数据应用程序对大型矩阵进行节能处理是研究的重点领域。将大型矩阵素描为其低维表示是一种有效的策略。本文首次为基于随机投影的约翰逊·林登斯特劳斯（JL）变换开发了一类素描方法的高能效硬件实现。至关重要的是，我们展示了如何利用投影矩阵中固有的随机性来创建几种机器学习应用程序的高效定点算法实现。具体地说，变换的随机矩阵具有两个关键属性，可以在硬件实现中获得大量的能量。第一个是平滑属性，它使我们能够在JL变换本身的计算中大幅减少操作数的位宽。第二个是随机化属性，它允许在后续的机器学习应用程序中减少位宽。此外，我们确定了一种随机矩阵构造方法，该方法利用特殊的稀疏结构来实现最有效的硬件实现，并在FPGA上实现高度优化的转换。在（1）k最近邻（KNN）分类和（2）主成分分析（PCA）上的实验结果表明，在相同的位宽下，利用随机投影的拟议流程可将两个延迟最多提高7倍和能量。此外，通过利用平滑和随机化属性，我们能够在KNN中使用1位而不是4位乘法器，这分别导致面积和能量分别增加50％和6％。与标准设计相比，所提出的I / O流策略以及我们确定的硬件高效JL算法能够将运行时间减少50％，在随机投影阶段减少17％的面积。

著录项

来源
《IEEE/ACM International Conference on Computer-Aided Design》|2016年|1-8|共8页
会议地点
作者
Ye Wang; Constantine Caramanis; Michael Orshansky;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Transforms; Hardware; Smoothing methods; Principal component analysis; Algorithm design and analysis; Sparse matrices; Signal processing algorithms;

机译：变换;硬件;平滑方法;主成分分析;算法设计和分析;稀疏矩阵;信号处理算法;

相似文献

外文文献
中文文献
专利

1. Area-efficient differential Gaussian circuit for dedicated hardware implementations of Gaussian function based machine learning algorithms [J] . D. Vrtaric, V. Ceperic, A. Baric Neurocomputing . 2013,第octa22期

机译：面积有效的差分高斯电路，用于基于高斯函数的机器学习算法的专用硬件实现
2. Efficient hardware implementations of QTL cipher for RFID applications [J] . Nivedita Shrivastava, Pulkit Singh, Bibhudendra Acharya International Journal of High Performance Systems Architecture . 2020,第1期

机译：用于RFID应用的QTL密码的高效硬件实现
3. Efficient utilization of imprecise computational blocks for hardware implementation of imprecision tolerant applications [J] . Mandiani Hamid Reza, Javadi Mohammad Haji Seyed, Fakhraie Sied Mehdi Microelectronics journal . 2017,第MARa期

机译：有效利用不精确的计算块，以硬件实现不精确的容忍应用程序
4. Exploiting randomness in sketching for efficient hardware implementation of machine learning applications [C] . Ye Wang, Constantine Caramanis, Michael Orshansky IEEE/ACM International Conference on Computer-Aided Design . 2016

机译：利用机器学习应用的高效硬件实现的素描中的随机性
5. Emerging Opportunities in Machine Learning Hardware Acceleration: From Advanced Neural Networks Implementation to Ultra-efficient Deep Learning Framework Using Next Generation Technology [D] . ?Cai, Ruizhe 2020

机译：机器学习硬件加速的新兴机会：从先进的神经网络实现，使用下一代技术实现超高效的深度学习框架
6. Cardiac CT: Technological Advances in Hardware Software and Machine Learning Applications [O] . Frederic Commandeur, Markus Goeller, Damini Dey -1

机译：心脏CT：硬件软件和机器学习应用程序中的技术进步
7. An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm [O] . Sergio Spano, Gian Carlo Cardarilli, Luca Di Nunzio, 2019

机译：增强学习的有效硬件实现：Q学习算法

Exploiting randomness in sketching for efficient hardware implementation of machine learning applications

摘要

著录项

相似文献

相关主题

期刊订阅