首页> 外文期刊>Emerging and Selected Topics in Circuits and Systems, IEEE Journal on >Tightly Coupled Machine Learning Coprocessor Architecture With Analog In-Memory Computing for Instruction-Level Acceleration
【24h】

Tightly Coupled Machine Learning Coprocessor Architecture With Analog In-Memory Computing for Instruction-Level Acceleration

机译:紧密耦合的机器学习协处理器架构与模拟内存计算,可实现指令级加速

获取原文
获取原文并翻译 | 示例
           

摘要

Low-profile mobile computing platforms often need to execute a variety of machine learning algorithms with limited memory and processing power. To address this challenge, this work presents Coara, an instruction-level processor acceleration architecture, which efficiently integrates an approximate analog in-memory computing coprocessor for accelerating general machine learning applications by exploiting analog register file cache. The instruction-level acceleration offers true programmability beyond the degree of freedom provided by reconfigurable machine learning accelerators, and also allows the code generation stage of a compiler back-end to control the coprocessor execution and data flow, so that applications do not need high-level machine learning software frameworks with a large memory footprint. Conventional analog and mixed-signal accelerators suffer from the overhead of frequent data conversion between analog and digital signals. To solve this classical problem, Coara uses an analog register file cache, which interfaces the analog in-memory computing coprocessor with the digital register file of the processor core. As a result, more than 90% of data conversion overhead with ADC and DAC can be eliminated by temporarily storing the result of analog computation in a switched-capacitor analog memory cell until data dependency occurs. Cycle-accurate Verilog RTL model of the proposed architecture is evaluated with 45 nm CMOS technology parameters while executing machine learning benchmark computation codes that are generated by a customized cross-compiler without using machine learning software frameworks.
机译:低调的移动计算平台通常需要在内存和处理能力有限的情况下执行各种机器学习算法。为解决这一挑战,本文提出了指令级处理器加速架构Coara,该架构有效地集成了近似的模拟内存计算协处理器,以通过利用模拟寄存器文件缓存来加速通用机器学习应用程序。指令级加速提供了可重配置机器学习加速器所提供的自由度之外的真正可编程性,并且还允许编译器后端的代码生成阶段控制协处理器的执行和数据流,因此应用程序不需要内存占用量大的高级机器学习软件框架。常规的模拟和混合信号加速器要承受在模拟和数字信号之间频繁进行数据转换的开销。为了解决这一经典问题,Coara使用了一个模拟寄存器堆,将模拟内存计算协处理器与处理器内核的数字寄存器接口。结果,通过将模拟计算的结果临时存储在开关电容器模拟存储单元中直到出现数据依赖性,可以消除ADC和DAC超过90%的数据转换开销。在执行由自定义交叉编译器生成的不使用机器学习软件框架的机器学习基准计算代码的同时,使用45 nm CMOS技术参数评估了所提出体系结构的周期精确的Verilog RTL模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号