首页> 外文会议>International Conference on Application-specific Systems, Architectures and Processors >How to Reach Real-Time AI on Consumer Devices? Solutions for Programmable and Custom Architectures
【24h】

How to Reach Real-Time AI on Consumer Devices? Solutions for Programmable and Custom Architectures

机译:如何在消费者设备上达到实时AI? 可编程和自定义架构的解决方案

获取原文

摘要

The unprecedented performance of deep neural networks (DNNs) has led to large strides in various Artificial Intelligence (AI) inference tasks, such as object and speech recognition. Nevertheless, deploying such AI models across commodity devices faces significant challenges: large computational cost, multiple performance objectives, hardware heterogeneity and a common need for high accuracy, together pose critical problems to the deployment of DNNs across the various embedded and mobile devices in the wild. As such, we have yet to witness the mainstream usage of state-of-the-art deep learning algorithms across consumer devices. In this paper, we provide preliminary answers to this potentially game-changing question by presenting an array of design techniques for efficient AI systems. We start by examining the major roadblocks when targeting both programmable processors and custom accelerators. Then, we present diverse methods for achieving real-time performance following a cross-stack approach. These span model-, system- and hardware-level techniques, and their combination. Our findings provide illustrative examples of AI systems that do not overburden mobile hardware, while also indicating how they can improve inference accuracy. Moreover, we showcase how custom ASIC- and FPGA-based accelerators can be an enabling factor for next-generation AI applications, such as multi-DNN systems. Collectively, these results highlight the critical need for further exploration as to how the various cross-stack solutions can be best combined in order to bring the latest advances in deep learning close to users, in a robust and efficient manner.
机译:深度神经网络(DNN)的前所未有的性能导致了各种人工智能(AI)推理任务的大进步,例如对象和语音识别。尽管如此,跨商品设备部署此类AI模型面临重大挑战:大量的计算成本,多种性能目标,硬件异质性和对高精度的共同需求,一起对野外各种嵌入式和移动设备部署DNN的关键问题。 。因此,我们尚未见证过消费设备的最先进的深度学习算法的主流使用。在本文中,我们通过呈现用于高效AI系统的设计技术阵列来提供对该潜在的游戏改变问题的初步答案。我们首先检查可编程处理器和自定义加速器时的主要障碍。然后,我们呈现不同的方法,以便在交叉堆栈方法之后实现实时性能。这些跨度模型 - ,系统和硬件级技术及其组合。我们的研究结果提供了不覆盖移动硬件的AI系统的说明性示例,同时也表明它们是如何提高推理准确性的。此外,我们展示了自定义ASIC和FPGA和FPGA的加速器如何成为下一代AI应用的能力,例如多DNN系统。总的来说,这些结果强调了进一步探索各种交叉堆栈解决方案如何最佳的探索的关键需求,以便以强大而有效的方式为用户提供深入学习的最新进展。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号