首页> 外文会议>IEEE International Conference on Consumer Electronics >Automated Hardware and Neural Network Architecture co-design of FPGA accelerators using multi-objective Neural Architecture Search
【24h】

Automated Hardware and Neural Network Architecture co-design of FPGA accelerators using multi-objective Neural Architecture Search

机译:使用多目标神经架构搜索的自动化硬件和神经网络架构共设计FPGA加速器

获取原文

摘要

State-of-the-art Neural Network Architectures (NNAs) are challenging to design and implement efficiently in hardware. In the past couple of years, this has led to an explosion in research and development of automatic Neural Architecture Search (NAS) tools. AutoML tools are now used to achieve state of the art NNA designs and attempt to optimize for hardware usage and design. Much of the recent research in the auto-design of NNAs has focused on convolution networks and image recognition, ignoring the fact that a significant part of the workload in data centers is general-purpose deep neural networks. In this work, we develop and test a general multilayer perceptron (MLP) flow that can take arbitrary datasets as input and automatically produce optimized NNAs and hardware designs. We test the flow on six benchmarks. Our results show we exceed the performance of currently published MLP accuracy results and are competitive with non-MLP based results. We compare general and common GPU architectures with our scalable FPGA design and show we can achieve higher efficiency and higher throughput (outputs per second) for the majority of datasets. Further insights into the design space for both accurate networks and high performing hardware shows the power of co-design by correlating accuracy versus throughput, network size versus accuracy, and scaling to high-performance devices.
机译:最先进的神经网络架构(NNA)具有挑战性地设计和实施硬件。在过去的几年里,这导致了自动神经结构的研究和开发爆发了。自动机工具现已用于实现最先进的NNA设计,并尝试优化硬件使用和设计。 NNA的自动设计中最近的大部分研究都集中在卷积网络和图像识别上,忽略了数据中心中工作量的重要部分是通用的深神经网络的事实。在这项工作中,我们开发和测试一个通用的多层Perceptron(MLP)流量,可以将任意数据集作为输入,并自动生产优化的NNA和硬件设计。我们在六个基准测试中测试流程。我们的结果表明,我们超越了目前已发表的MLP精度结果的表现,并具有基于非MLP的结果竞争。我们使用我们可扩展的FPGA设计进行比较General和Common GPU架构,并显示我们可以为大多数数据集实现更高的效率和更高的吞吐量(每秒输出)。进一步了解精确网络和高性能硬件的设计空间,通过关联精度与吞吐量,网络大小与精度和缩放到高性能设备来显示共设计的功率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号