...
首页> 外文期刊>ACM Journal on Emerging Technologies in Computing Systems >Hardware-Software Co-design to Accelerate Neural Network Applications
【24h】

Hardware-Software Co-design to Accelerate Neural Network Applications

机译:硬件软件共同设计加速神经网络应用

获取原文
获取原文并翻译 | 示例
           

摘要

Many applications, such as machine learning and data sensing, are statistical in nature and can tolerate some level of inaccuracy in their computation. A variety of designs have been put forward exploiting the statistical nature of machine learning through approximate computing. With approximate multipliers being the main focus due to their high usage in machine-learning designs. In this article, we propose a novel approximate floating point multiplier, called CMUL, which significantly reduces energy and improves performance of multiplication while allowing for a controllable amount of error. Our design approximately models multiplication by replacing the most costly step of the operation with a lower energy alternative. To tune the level of approximation, CMUL dynamically identifies the inputs that produces the largest approximation error and processes them in precise mode. To use CMUL for deep neural network (DNN) acceleration, we propose a framework that modifies the trained DNN model to make it suitable for approximate hardware. Our framework adjusts the DNN weights to a set of "potential weights" that are suitable for approximate hardware. Then, it compensates the possible quality loss by iteratively retraining the network. Our evaluation with four DNN applications shows that, CMUL can achieve 60.3% energy efficiency improvement and 3.2x energy-delay product (EDP) improvement as compared to the baseline GPU, while ensuring less than 0.2% quality loss. These results are 38.7% and 2.0x higher than energy efficiency and EDP improvement of the CMUL without using the proposed framework.
机译:许多应用程序,例如机器学习和数据感应,是本质上的统计,可以忍受其计算中的一些不准确度。通过近似计算提出了各种设计利用机器学习的统计性质。由于其在机器学习设计的高度使用率,近似乘数是主要焦点。在本文中,我们提出了一种新颖的近似浮点乘数,称为CMUL,这显着降低了能量并提高了乘法的性能,同时允许可控的误差量。我们的设计通过用较低的能量替代方案替换操作的最昂贵的步骤来逐渐乘法。为了调整近似水平,CMUL动态地标识产生最大近似误差的输入,并以精确模式处理它们。要使用CMUL进行深度神经网络(DNN)加速,我们提出了一个修改训练的DNN模型的框架,使其适用于近似硬件。我们的框架将DNN权重调整为适合近似硬件的一组“潜在权重”。然后,它通过迭代再培训网络来补偿可能的质量损失。我们与四个DNN应用的评估表明,与基线GPU相比,CMUL可以实现60.3%的能效改善和3.2倍的能量 - 延迟产品(EDP)改进,同时确保少于0.2%的质量损失。这些结果比在不使用所提出的框架的情况下高出38.7%和2.0倍的高能量效率和EDP改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号