【24h】

Cyclist: Accelerating hardware development

机译:骑单车的人:加快硬件开发

获取原文

摘要

The end of Dennard scaling has led to an increase in demand for energy-efficient custom hardware accelerators, but current hardware design is slow and laborious, partly because each iteration of the compile-run-debug cycle can take hours or even days with existing simulation and emulation platforms. Cyclist is a new emulation platform designed specifically to shorten the total compile-run-debug cycle. The Cyclist toolflow converts a Chisel RTL design to a parallel dataflow graph, which is then mapped to the Cyclist hardware architecture, consisting of a tiled array of custom parallel emulation engines. Cyclist provides cycle-accurate/bit-accurate RTL emulation at speeds approaching FPGA emulation, but with compile time closer to software simulation. Cyclist provides full visibility and debuggability of the hardware design, including moving forwards and backwards in simulation time while searching for trigger events. The snapshot facility used for debugging is also used to provide a “pay-as-you-go” mapping strategy, which allows emulation to begin execution with a low-effort placement, while higher-quality emulation placements are optimized in the background and swapped in to a running emulation. The Cyclist ASIC design requires 0.069mm2 per tile and runs at 2GHz in a 45nm CMOS process. Our evaluation demonstrate that Cyclist outperforms FPGA emulation, VCS, and C+,+, simulation on combined compile and run time for up to a billion cycles for a set of real-world hardware benchmarks.
机译:Dennard扩展的结束导致对节能的自定义硬件加速器的需求增加,但是当前的硬件设计缓慢而费力,部分原因是在现有模拟的情况下,编译-运行-调试周期的每次迭代都可能需要数小时甚至数天的时间。和仿真平台。 Cyclist是一个新的仿真平台,专门设计用于缩短总的编译运行调试周期。 Cyclist工具流将Chisel RTL设计转换为并行数据流图,然后将其映射到Cyclist硬件体系结构,该体系结构由一组定制并行仿真引擎组成。 Cyclist以接近FPGA仿真的速度提供周期精确/位精确的RTL仿真,但编译时间更接近软件仿真。 Cyclist提供了硬件设计的完全可见性和可调试性,包括在搜索触发事件时在仿真时间中向前和向后移动。用于调试的快照工具还用于提供“即付即用”映射策略,该策略允许仿真以省力的布局开始执行,而高质量的仿真布局在后台进行了优化并进行了交换。进行模拟。 Cyclist ASIC设计需要每片0.069mm 2 ,并在45nm CMOS工艺中以2GHz运行。我们的评估表明,对于一组现实世界的硬件基准测试,Cyclist在组合编译和运行时的仿真方面要优于FPGA仿真,VCS和C ++,最多可循环十亿个周期。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号