首页> 外文会议>Asian Conference on Computer Vision >Hardware-Aware Softmax Approximation for Deep Neural Networks
【24h】

Hardware-Aware Softmax Approximation for Deep Neural Networks

机译:深度神经网络的硬件感知型Softmax逼近

获取原文

摘要

There has been a rapid development of custom hardware for accelerating the inference speed of deep neural networks (DNNs), by explicitly incorporating hardware metrics (e.g., area and energy) as additional constraints, in addition to application accuracy. Recent efforts mainly focused on linear functions (matrix multiplication) in convolu-tional (Conv) or fully connected (FC) layers, while there is no publicly available study on optimizing the inference of non-linear functions in DNNs, with hardware constraints. In this paper, we address the problem of cost-efficient inference for Softmax, a popular non-linear function in DNNs. We introduce a hardware-aware linear approximation framework by algorithm and hardware co-optimization, with the goal of minimizing the cost in terms of area and energy, without incurring significant loss in application accuracy. This is achieved by simultaneously reducing the operand bit-width and approximating cost-intensive operations in Softmax (e.g. exponential and division) with cost-effective operations (e.g. addition and bit shifts). We designed and synthesized a hardware unit for our approximation approach, to estimate the area and energy consumption. In addition, we introduce a training method to further save area and energy cost, by reduced precision. Our approach reduces area cost by 13 x and energy consumption by 2x with 11-bit operand width, compared to baseline at 19-bit for VOC2007 dataset in Faster R-CNN.
机译:通过明确地将硬件指标(例如,面积和能量)作为附加约束,除了应用精度之外,自定义硬件的发展迅速,以加快深度神经网络(DNN)的推理速度。最近的工作主要集中在卷积(Conv)层或完全连接(FC)层中的线性函数(矩阵乘法),而没有公开可用的研究来优化具有硬件约束的DNN中非线性函数的推论。在本文中,我们针对Softmax(一种在DNN中流行的非线性函数)解决了具有成本效益的推理问题。我们通过算法和硬件协同优化引入了一种硬件可感知的线性近似框架,其目的是在面积和能量方面将成本降至最低,而不会导致应用精度的显着降低。这是通过同时减少操作数的位宽并用具有成本效益的操作(例如加法和移位)来近似化Softmax(例如指数和除法)中的成本密集型操作来实现的。我们为逼近方法设计并合成了一个硬件单元,以估算面积和能耗。此外,我们介绍了一种训练方法,可通过降低精度进一步节省面积和能源成本。与Faster R-CNN中VOC2007数据集的19位基线相比,我们的方法在11位操作数宽度的情况下将区域成本降低了13倍,将能耗降低了2倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号