首页> 外文会议>IEEE International Symposium on Circuits and Systems >A 0.42V high bandwidth synthesizable parallel access smart memory fabric for computer vision

【24h】

A 0.42V high bandwidth synthesizable parallel access smart memory fabric for computer vision

机译：用于计算机视觉的0.42V高带宽合成并行接入智能记忆面料

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present a design of a 2 to 12 port scalable multiport compiler with simultaneous read port access and closely packed graphics integration capability specially designed for low power high bandwidth, low latency stream vector processors and machine learning applications. Novel pipe-lined decoder and bitline repeater insertion helps to achieve a fast cycle time. Memory words can be accessed in different ways, serial, parallel or mixed. A wide supply range from 0.4V to 1.1V is supported without any complex write or read assist circuit. Design is non-self-timed and fully testable while timing and power views are generated through a static timing analysis (STA) approach. Layout is based on automatic place and route of standard cells in periphery and full custom standard cell compatible high density memory core. Full custom core is tightly bound with the common graphics processing operations, to enable low latency (<; 1μs), high bandwidth operations at low voltage. Hybrid approach reduces the turn around time to just a few man weeks. Area penalty of a 2W2R 64 Kbit instance is up to 10% in comparison to a logic rule based full custom high speed 1W1R compiler, while doubling the throughput. Compared to complete RTL based synthesis approach, area is just 5% for 64 Kbit. A 2W2R 32×128 testchip instance in sub-20nm FinFET process, runs up-to 3 GHz on CAD at 1.1 V supply at -40 °C. While measured speed of same instance on silicon is 86 MHz (at 0.42 V) for simultaneous access from both the ports and energy consumed is just 5 pJ/cycle in typical process corner. Architecture is scalable up to 64KB for more parallel architectures (64 cores) as demanded in ultra-high definition real time computational photography [1].

机译：我们为2至12个端口可伸缩多端口编译器的设计，具有同时读取端口访问和紧密的图形集成功能，专为低功耗高带宽，低延迟流矢量处理器和机器学习应用而设计。新型管道解码器和位线中继器插入有助于实现快速循环时间。可以以不同的方式访问内存单词，串行，并行或混合。在没有任何复杂的写入或读取辅助电路的情况下，支持宽的电源范围为0.4V至1.1V。通过静态定时分析（STA）方法生成时序和电源视图，设计是非自定时和完全可测试的。布局是基于外围和全定制标准单元兼容高密度存储器核心的标准单元的自动位置和路径。完整的自定义核心与公共图形处理操作紧密绑定，以实现低延迟（<;1μs），低电压的高带宽操作。混合方法将转弯时间减少到几个人几周。与基于逻辑规则的全定制高速1W1R编译器相比，2W2R 64 Kbit实例的区域惩罚高达10％，同时将吞吐量加倍。与完整的RTL基合成方法相比，面积仅为64 kbit。 SUB-20NM FinFET过程中的2W2R 32×128 Testchip实例，在-40°C下，CAD上的CAD运行高达3 GHz。虽然硅上同一实例的测量速度为86 MHz（0.42 V），用于同时从销料和能量的同时访问仅为5 PJ /循环在典型的过程角落。适用于超高清实时计算摄影中所要求的更多并行架构（64个核心），架构可扩展到64KB。

著录项

来源
《IEEE International Symposium on Circuits and Systems 》|2017年|721p|共4页
会议地点
作者
Prashant Dubey; Kritika Aditya; Ankur Srivastava; Amit Khanuja; Jamil Kawa; Thu Nguyen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类电工技术 ;
关键词
Cell based design; Data-Path; Register File; Energy Efficiency; Internet of Things; Neuromorphic Computing; Chip multi-processor; vector processors; machine learning;

机译：基于单元格的设计;数据路径;注册文件;能量效率;事物互联网;神经形态计算;芯片多处理器;矢量处理器;机器学习;

相似文献

外文文献
中文文献
专利

1. Highly-Parallel Stereo Vision VLSI Processor Based on an Optimal Parallel Memory Access Scheme [J] . Masanori Hariyama, Seunghawan Lee, Michitaka Kameyama IEICE Transactions on Electronics . 2001 ,第3期

机译：基于最佳并行存储器访问方案的高度并行立体视觉VLSI处理器
2. Computer Memory with Parallel Conflict-Free Sorting Network-Based Ordered Data Access [J] . Anatoliy Melnyk Recent patents on computer science . 2015 ,第1期

机译：基于并行无冲突排序网络的有序数据访问的计算机内存
3. Design of a stereo vision VLSI processor based on optimal allocation for parallel memory access [J] . Seunghwan Lee 東北大学電通谈话会記録 . 2000 ,第1期

机译：基于最优分配的并行存储器访问的立体视觉VLSI处理器设计
4. A 0.42V high bandwidth synthesizable parallel access smart memory fabric for computer vision [C] . Prashant Dubey, Kritika Aditya, Ankur Srivastava, IEEE International Symposium on Circuits and Systems . 2017

机译：用于计算机视觉的0.42V高带宽合成并行接入智能记忆面料
5. Optimizing performance on massively parallel computers using a remote memory access programming model. [D] . Krishnan, Manojkumar. 2010

机译：使用远程内存访问编程模型在大型并行计算机上优化性能。
6. On the interplay between working memory consolidation and attentional selection in controlling conscious access: parallel processing at a cost—a comment on ‘The interplay of attention and consciousness in visual search attentional blink and working memory consolidation’ [O] . Brad Wyble, Howard Bowman, Mark Nieuwenstein 2015

机译：关于在控制意识访问中工作记忆整合和注意选择之间的相互作用：需要付出一定的代价进行并行处理-评论视觉搜索注意眨眼和工作记忆整合中注意和意识之间的相互作用
7. Synthesizing Parallel Imaging Applications using the CAP Computer-Aided Parallelization tool [O] . The Cap Computer-aided Parallelization Tool, B. A. Gennart, M. Mazzariol, 1998

机译：使用Cap计算机辅助并行化工具合成并行成像应用程序
8. Performance Evaluation of Remote Memory Access (RMA) Programming on Shared Memory Parallel Computers [R] . Jin, Hao-Qiang, Jost, Gabriele 2002

机译：共享存储器并行计算机上远程内存访问（Rma）编程的性能评估

A 0.42V high bandwidth synthesizable parallel access smart memory fabric for computer vision

摘要

著录项

相似文献

相关主题

期刊订阅