首页> 外文会议>IEEE/ACM International Conference on Computer-Aided Design >Exploiting randomness in sketching for efficient hardware implementation of machine learning applications
【24h】

Exploiting randomness in sketching for efficient hardware implementation of machine learning applications

机译:利用素描中的随机性来实现机器学习应用程序的有效硬件实现

获取原文

摘要

Energy-efficient processing of large matrices for big-data applications using hardware acceleration is an intense area of research. Sketching of large matrices into their lower-dimensional representations is an effective strategy. For the first time, this paper develops a highly energy-efficient hardware implementation of a class of sketching methods based on random projections, known as Johnson Lindenstrauss (JL) transform. Crucially, we show how the randomness inherent in the projection matrix can be exploited to create highly efficient fixed-point arithmetic realizations of several machine-learning applications. Specifically, the transform's random matrices have two key properties that allow for significant energy gains in hardware implementations. The first is the smoothing property that allows us to drastically reduce operand bit-width in computation of the JL transform itself. The second is the randomizing property that allows bit-width reduction in subsequent machine-learning applications. Further, we identify a random matrix construction method that exploits the special sparsity structure to result in the most hardware-efficient realization and implement the highly optimized transform on an FPGA. Experimental results on (1) the k-nearest neighbor (KNN) classification and (2) the principal component analysis (PCA) show that with the same bit-width the proposed flow utilizing random projection achieves an up to 7× improvement in both latency and energy. Furthermore, by exploiting the smoothing and randomizing properties we are able to use a 1-bit instead of a 4-bit multiplier within KNN, which results in additional 50% and 6% improvement in area and energy respectively. The proposed I/O streaming strategy along with the hardware-efficient JL algorithm identified by us is able to achieve a 50% runtime reduction, a 17% area reduction in the stage of random projection compared to a standard design.
机译:使用硬件加速为大数据应用程序对大型矩阵进行节能处理是研究的重点领域。将大型矩阵素描为其低维表示是一种有效的策略。本文首次为基于随机投影的约翰逊·林登斯特劳斯(JL)变换开发了一类素描方法的高能效硬件实现。至关重要的是,我们展示了如何利用投影矩阵中固有的随机性来创建几种机器学习应用程序的高效定点算法实现。具体地说,变换的随机矩阵具有两个关键属性,可以在硬件实现中获得大量的能量。第一个是平滑属性,它使我们能够在JL变换本身的计算中大幅减少操作数的位宽。第二个是随机化属性,它允许在后续的机器学习应用程序中减少位宽。此外,我们确定了一种随机矩阵构造方法,该方法利用特殊的稀疏结构来实现最有效的硬件实现,并在FPGA上实现高度优化的转换。在(1)k最近邻(KNN)分类和(2)主成分分析(PCA)上的实验结果表明,在相同的位宽下,利用随机投影的拟议流程可将两个延迟最多提高7倍和能量。此外,通过利用平滑和随机化属性,我们能够在KNN中使用1位而不是4位乘法器,这分别导致面积和能量分别增加50%和6%。与标准设计相比,所提出的I / O流策略以及我们确定的硬件高效JL算法能够将运行时间减少50%,在随机投影阶段减少17%的面积。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号