...
首页> 外文期刊>Proceedings of the IEEE >Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey
【24h】

Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey

机译:神经网络模型压缩和硬件加速:全面调查

获取原文
获取原文并翻译 | 示例

摘要

Domain-specific hardware is becoming a promising topic in the backdrop of improvement slow down for general-purpose processors due to the foreseeable end of Moore's Law. Machine learning, especially deep neural networks (DNNs), has become the most dazzling domain witnessing successful applications in a wide spectrum of artificial intelligence (AI) tasks. The incomparable accuracy of DNNs is achieved by paying the cost of hungry memory consumption and high computational complexity, which greatly impedes their deployment in embedded systems. Therefore, the DNN compression concept was naturally proposed and widely used for memory saving and compute acceleration. In the past few years, a tremendous number of compression techniques have sprung up to pursue a satisfactory tradeoff between processing efficiency and application accuracy. Recently, this wave has spread to the design of neural network accelerators for gaining extremely high performance. However, the amount of related works is incredibly huge and the reported approaches are quite divergent. This research chaos motivates us to provide a comprehensive survey on the recent advances toward the goal of efficient compression and execution of DNNs without significantly compromising accuracy, involving both the high-level algorithms and their applications in hardware design. In this article, we review the mainstream compression approaches such as compact model, tensor decomposition, data quantization, and network sparsification. We explain their compression principles, evaluation metrics, sensitivity analysis, and joint-way use. Then, we answer the question of how to leverage these methods in the design of neural network accelerators and present the state-of-the-art hardware architectures. In the end, we discuss several existing issues such as fair comparison, testing workloads, automatic compression, influence on security, and framework/hardware-level support, and give promising topics in this field and the possible challenges as well. This article attempts to enable readers to quickly build up a big picture of neural network compression and acceleration, clearly evaluate various methods, and confidently get started in the right way.
机译:域的特定硬件正在改善的背景下成为一个有前途的主题,因为摩尔定律的可预见结束,通用处理器的改善缓慢。机器学习,尤其是深度神经网络(DNN),已成为最令人眼花缭乱的域名,在广泛的人工智能(AI)任务中有着成功的应用。通过支付饥饿记忆消耗和高计算复杂性的成本来实现DNN的无与伦比的准确性,这大大阻碍了他们在嵌入式系统中的部署。因此,DNN压缩概念自然地提出并广泛用于内存保存和计算加速度。在过去几年中,巨大的压缩技术涌现在加工效率和应用准确性之间追求令人满意的权衡。最近,这波已经扩散到神经网络加速器的设计,以获得极高的性能。但是,相关工程的数量是令人难以置信的巨大巨大,报告的方法非常分歧。该研究混乱的激励我们对最近的升高目标的进步和执行DNN的目标的进步提供了全面的调查,而无需显着妥协的准确性,涉及高级算法及其在硬件设计中的应用。在本文中,我们审查了主流压缩方法,如紧凑的模型,张量分解,数据量化和网络稀疏。我们解释了他们的压缩原则,评估度量,敏感性分析和联合使用。然后,我们回答如何利用这些方法在设计神经网络加速器的设计中,并呈现最先进的硬件架构。最后,我们讨论了几个现有问题,如公平比较,测试工作负载,自动压缩,对安全影响以及框架/硬件级别支持,并在此领域提供有前途的主题和可能的挑战。本文试图使读者能够快速构建神经网络压缩和加速的大图片,清楚地评估各种方法,并以正确的方式自信地开始。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号