Tightly Coupled Machine Learning Coprocessor Architecture With Analog In-Memory Computing for Instruction-Level Acceleration

Chung SungWon; Wang Jiemi

首页> 外文期刊>Emerging and Selected Topics in Circuits and Systems, IEEE Journal on >Tightly Coupled Machine Learning Coprocessor Architecture With Analog In-Memory Computing for Instruction-Level Acceleration

【24h】

Tightly Coupled Machine Learning Coprocessor Architecture With Analog In-Memory Computing for Instruction-Level Acceleration

机译：紧密耦合的机器学习协处理器架构与模拟内存计算，可实现指令级加速

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Low-profile mobile computing platforms often need to execute a variety of machine learning algorithms with limited memory and processing power. To address this challenge, this work presents Coara, an instruction-level processor acceleration architecture, which efficiently integrates an approximate analog in-memory computing coprocessor for accelerating general machine learning applications by exploiting analog register file cache. The instruction-level acceleration offers true programmability beyond the degree of freedom provided by reconfigurable machine learning accelerators, and also allows the code generation stage of a compiler back-end to control the coprocessor execution and data flow, so that applications do not need high-level machine learning software frameworks with a large memory footprint. Conventional analog and mixed-signal accelerators suffer from the overhead of frequent data conversion between analog and digital signals. To solve this classical problem, Coara uses an analog register file cache, which interfaces the analog in-memory computing coprocessor with the digital register file of the processor core. As a result, more than 90% of data conversion overhead with ADC and DAC can be eliminated by temporarily storing the result of analog computation in a switched-capacitor analog memory cell until data dependency occurs. Cycle-accurate Verilog RTL model of the proposed architecture is evaluated with 45 nm CMOS technology parameters while executing machine learning benchmark computation codes that are generated by a customized cross-compiler without using machine learning software frameworks.

机译：低调的移动计算平台通常需要在内存和处理能力有限的情况下执行各种机器学习算法。为解决这一挑战，本文提出了指令级处理器加速架构Coara，该架构有效地集成了近似的模拟内存计算协处理器，以通过利用模拟寄存器文件缓存来加速通用机器学习应用程序。指令级加速提供了可重配置机器学习加速器所提供的自由度之外的真正可编程性，并且还允许编译器后端的代码生成阶段控制协处理器的执行和数据流，因此应用程序不需要内存占用量大的高级机器学习软件框架。常规的模拟和混合信号加速器要承受在模拟和数字信号之间频繁进行数据转换的开销。为了解决这一经典问题，Coara使用了一个模拟寄存器堆，将模拟内存计算协处理器与处理器内核的数字寄存器接口。结果，通过将模拟计算的结果临时存储在开关电容器模拟存储单元中直到出现数据依赖性，可以消除ADC和DAC超过90％的数据转换开销。在执行由自定义交叉编译器生成的不使用机器学习软件框架的机器学习基准计算代码的同时，使用45 nm CMOS技术参数评估了所提出体系结构的周期精确的Verilog RTL模型。

著录项

来源
《Emerging and Selected Topics in Circuits and Systems, IEEE Journal on》 |2019年第3期|544-561|共18页
作者
Chung SungWon; Wang Jiemi;
展开▼
作者单位

Univ Southern Calif Dept Elect Engn Los Angeles CA 90089 USA;

Samsung Austin R&D Ctr Austin TX 78746 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Machine learning hardware accelerator; programmable accelerator; approximate analog computing; in-memory computing; analog datapath; analog register file; switched capacitor circuit; tightly coupled coprocessor; deep learning;

机译：机器学习硬件加速器;可编程加速器;近似模拟计算;内存计算模拟数据路径模拟寄存器文件;开关电容电路紧密耦合的协处理器;深度学习;

相似文献

外文文献
中文文献
专利

1. Circuits and Architectures for In-Memory Computing-Based Machine Learning Accelerators [J] . Ankit Aayush, Chakraborty Indranil, Agrawal Amogh, IEEE Micro . 2020,第6期

机译：基于内存计算的机器学习加速器的电路和架构
2. An Energy-Efficient Nonvolatile In-Memory Computing Architecture for Extreme Learning Machine by Domain-Wall Nanowire Devices [J] . Wang Yuhao, Yu Hao, Ni Leibin, Nanotechnology, IEEE Transactions on . 2015,第6期

机译：领域墙纳米线设备为极限学习机提供的高能效非易失性内存计算架构
3. Memristor-CMOS Analog Coprocessor for Acceleration of High-Performance Computing Applications [J] . Athreyas Nihar, Song Wenhao, Perot Blair, ACM Journal on Emerging Technologies in Computing Systems . 2018,第3期

机译：Memristor-CMOS模拟协处理器，用于加速高性能计算应用程序
4. Opportunities and Limitations of Emerging Analog in-Memory Compute DNN Architectures [C] . Pouya Houshmand, Stefan Cosemans, Linyan Mei, International Electron Devices Meeting . 2020

机译：新兴模拟内存中的机遇和限制计算DNN架构
5. In-Memory Computing Architecture for Deep Learning Acceleration [D] . Chen, Fan. 2020

机译：用于深度学习加速的内存计算架构
6. Efficient Acceleration of Stencil Applications through In-Memory Computing [O] . Hasan Erdem Yantır, Ahmed M. Eltawil, Khaled N. Salama 2020

机译：通过内存中计算有效加速模板应用
7. Correction to: Deep In-memory Architectures for Machine Learning [O] . Mingu Kang, Sujan Gonugondla, Naresh R. Shanbhag 2020

机译：校正：用于机器学习的深内记忆架构

Tightly Coupled Machine Learning Coprocessor Architecture With Analog In-Memory Computing for Instruction-Level Acceleration

摘要

著录项

相似文献

相关主题

期刊订阅