首页> 外文学位 >Architectural explorations for high-performance field-programmable gate arrays.

【24h】

Architectural explorations for high-performance field-programmable gate arrays.

机译：高性能现场可编程门阵列的架构探索。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Cell-based design technology has dominated ASIC implementation over the past quarter century by offering an economically compelling combination of low manufacturing cost and acceptable design and prototyping costs. With the advent of sub-100nm CMOS technologies, the design and prototyping costs of cell-based implementation have become prohibitive for most ASICs, making FPGAs increasingly popular. Current FPGAs, however, cannot meet the performance requirements of many ASICs due to their high programming overhead. Therefore, designing high-performance FPGA architecture is becoming increasingly important.;This thesis presents several architecture studies aimed at improving FPGA performance. We first present our work on performance benefits of monolithically stacked 3D-FPGA, whereby the programming overhead of an FPGA is stacked on top of a standard CMOS layer containing the logic blocks and interconnects, are investigated. A Virtex-II style 2D-FPGA fabric is used as a baseline architecture to quantify the relative improvements in logic density, delay, and power consumption achieved by such a 3D-FPGA. It is assumed that only the switch-transistor and configuration memory cells can be moved to the top layers and that the 3D-FPGA employs the same logic block and programmable interconnect architecture as the baseline 2D-FPGA. Assuming a configuration memory cell that is ≤ 0.7 the area of an SRAM cell and switch transistor having the same characteristics as nMOS devices in the CMOS layer are used, it is shown that a monolithically stacked 3D-FPGA can achieve 3.2 times higher logic density, 1.7 times lower critical path delay, and 1.7 times lower total dynamic power consumption than the baseline 2D-FPGA fabricated in the same 65nm technology node.;Based on lessons learned from the previous study, we embarked on two architectural studies to further improve performance of 2D-FPGA. The first is a new low-power routing fabric and shows that an FPGA that uses this fabric can achieve 1.54 times lower dynamic power consumption and 1.31 times lower average net delays with only 8% reduction in logic density over a baseline island-style FPGA implemented in the same 65nm CMOS technology. These improvements in power and delay are achieved by (i) using only short interconnect segments to reduce routed net lengths, and (ii) reducing interconnect segment loading due to programming overhead relative to the baseline FPGA without compromising routability. The new routing fabric is well-suited to monolithically stacked 3D-IC implementation. It is shown that a 3D-FPGA using this fabric can achieve a 3.3 times improvement in logic density, a 2.46 times improvement in delay, and a 2.87 times improvement in dynamic power consumption over the same baseline 2D-FPGA.;The second study is a design tool for routing channel segmentation in island-style FPGAs. Given the FPGA architecture parameters and a set of benchmark designs, this tool optimizes routing channel segmentation using the average interconnect power-delay product as a performance metric, which is estimated from placed and routed designs. A simulated-annealing procedure is used, whereby segmentation is incrementally changed in each iteration, the benchmark designs are mapped using VPR, and the performance metric is computed to decide whether to accept or reject the new segmentation. Run time is significantly reduced by using incremental routing in each iteration and parallelizing the metric evaluation. Experimental results using the MCNC benchmark designs demonstrate an average of 22% and 15% reduction in delay and power relative to a baseline segmentation. The results also show that average segment length should decrease with technology scaling. Finally, we demonstrate how the TORCH tool can be used to optimize other aspects of programmable routing in an FPGA.

机译：在过去的四分之一世纪中，基于单元的设计技术在经济上令人信服地将低制造成本与可接受的设计和原型制作成本相结合，从而在ASIC实施中占据主导地位。随着100nm以下CMOS技术的出现，基于单元的实现的设计和原型设计成本已成为大多数ASIC所无法承受的，这使得FPGA日益普及。但是，当前的FPGA由于其高编程开销而无法满足许多ASIC的性能要求。因此，设计高性能的FPGA架构变得越来越重要。;本文提出了一些旨在提高FPGA性能的架构研究。我们首先介绍单片堆叠3D-FPGA在性能方面的优势，从而研究FPGA的编程开销堆叠在包含逻辑块和互连的标准CMOS层之上。 Virtex-II风格的2D-FPGA架构被用作基准架构，以量化这种3D-FPGA所实现的逻辑密度，延迟和功耗方面的相对改进。假定只能将开关晶体管和配置存储单元移至顶层，并且3D-FPGA采用与基准2D-FPGA相同的逻辑块和可编程互连体系结构。假设使用的存储单元的面积≤SRAM单元的面积和CMOS层中具有与nMOS器件相同特性的开关晶体管，表明单片堆叠的3D-FPGA可以实现3.2倍的逻辑密度，关键路径延迟比同一65nm技术节点中制造的基线2D-FPGA低1.7倍，总动态功耗低1.7倍；基于先前研究的经验教训，我们着手进行了两项架构研究，以进一步提高FPGA的性能。 2D-FPGA。第一个是一种新的低功耗路由结构，表明使用该结构的FPGA可以实现的动态功耗降低1.54倍，平均净延迟降低1.31倍，并且与实施的基线岛式FPGA相比，逻辑密度仅降低了8％。采用相同的65nm CMOS技术。通过（i）仅使用短的互连段来减少路由的网络长度，以及（ii）由于相对于基准FPGA的编程开销而减少了互连段的负载，而不会损害可布线性，从而实现了功率和延迟的这些改善。新的布线结构非常适合单片堆叠的3D-IC实现。研究表明，与同一基线2D-FPGA相比，使用这种结构的3D-FPGA可以实现3.3倍的逻辑密度提高，2.46倍的延迟提高和2.87倍的动态功耗提高。一种用于在岛式FPGA中路由通道分段的设计工具。给定FPGA体系结构参数和一组基准设计，该工具使用平均互连功率延迟乘积作为性能指标来优化路由通道分段，该性能指标是根据布局和布线设计估算得出的。使用模拟退火程序，从而在每次迭代中逐步更改分段，使用VPR映射基准设计，并计算性能指标以决定是否接受或拒绝新分段。通过在每次迭代中使用增量路由并并行化指标评估，可以大大减少运行时间。使用MCNC基准设计的实验结果表明，与基准分段相比，延迟和功耗平均降低了22％和15％。结果还表明，平均分段长度应随着技术规模的增加而减小。最后，我们演示了如何使用TORCH工具在FPGA中优化可编程路由的其他方面。

著录项

作者
Lin, Mingjie.;
展开▼
作者单位

Stanford University.;

展开▼
授予单位 Stanford University.;
学科 Engineering Electronics and Electrical.
学位 Ph.D.
年度 2008
页码 111 p.
总页数 111
原文格式 PDF
正文语种 eng
中图分类无线电电子学、电信技术;
关键词

相似文献

外文文献
中文文献
专利

1. Exploration of preprocessing architectures for field-programmable gate array-based thermal-visual smart camera [J] . Imran Muhammad, Rinner Bernhard, Zand Sajjad Zandi, Journal of electronic imaging . 2016,第4期

机译：基于现场可编程门阵列的热视觉智能相机的预处理架构探索
2. Nonvolatile Power-Gating Field-Programmable Gate Array Using Nonvolatile Static Random Access Memory and Nonvolatile Flip-Flops Based on Pseudo-Spin-Transistor Architecture with Spin-Transfer-Torque Magnetic Tunnel Junctions [J] . Shuuichirou Yamamoto, Yusuke Shuto, Satoshi Sugahara Japanese journal of applied physics . 2012,第11ISSUE2期

机译：基于自旋转矩磁隧道结的伪自旋晶体管架构的非易失性静态随机存取存储器和非易失性触发器的非易失性门控现场可编程门阵列
3. Reliability Analysis of Field-Programmable Gate-Array-BasedrnSpace Computer Architectures [J] . Hogan Justin A., Weber Raymond J., LaMeres Brock J. Journal of Aerospace Computing, Information, and Communication . 2017,第4期

机译：基于现场可编程门阵列的rnSpace计算机体系结构的可靠性分析
4. High-Performance Spectral Element Methods on Field-Programmable Gate Arrays : Implementation, Evaluation, and Future Projection [C] . Martin Karp, Artur Podobas, Niclas Jansson, IEEE International Parallel and Distributed Processing Symposium . 2021

机译：现场可编程门阵列上的高性能光谱元素方法：实现，评估和未来投影
5. Field-programmable gate array implementation of a scalable integral image architecture based on systolic arrays. [D] . De la Cruz, Juan A. 2011

机译：基于脉动阵列的可扩展积分图像体系结构的现场可编程门阵列实现。
6. Mixed-precision weights network for field-programmable gate array [O] . Ninnart Fuengfusin, Hakaru Tamukoh, Chi-Hua Chen, 2021

机译：用于现场可编程门阵列的混合精密权重网络
7. Real Time 3-D Graphics Processing Hardware Design using Field-Programmable Gate Arrays. [O] . Warner James Ryan 2009

机译：使用现场可编程门阵列的实时3-D图形处理硬件设计。

Architectural explorations for high-performance field-programmable gate arrays.

摘要

著录项

相似文献

相关主题

期刊订阅