首页> 外文期刊>IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems >Generation of distributed logic-memory architectures through high-level synthesis
【24h】

Generation of distributed logic-memory architectures through high-level synthesis

机译:通过高级综合生成分布式逻辑内存架构

获取原文
获取原文并翻译 | 示例

摘要

With the increasing cost of on-chip global communication, high-performance designs for data-intensive applications require architectures that distribute hardware resources (computing logic, memories, interconnect, etc.) throughout the chip, while restricting computations and communications to geographic proximities. In this paper, we present a methodology for high-level synthesis (HLS) of distributed logic-memory architectures, i.e., architectures that have logic and memory distributed across several partitions in a chip. Conventional HLS tools are capable of extracting parallelism from a behavior for architectures that assume a monolithic controller/datapath communicating with a memory or memory hierarchy. This paper provides techniques to extend the synthesis frontier to more general architectures that can extract both coarse and fine-grained parallelism from data accesses and computations in a synergistic manner. Our methodology selects many possible ways of organizing data and computations, carefully examines the tradeoffs (i.e., communication overheads, synchronization costs, area overheads) in choosing one solution over another, and utilizes conventional HLS techniques for intermediate steps. We have evaluated the proposed framework on several benchmarks by generating register-transfer level (RTL) implementations using an existing commercial HLS tool with and without our enhancements, and by subjecting the resulting RTL circuits to logic synthesis and layout. The results show that circuits designed as distributed logic-memory architectures using our framework achieve significant (up to 5.3/spl times/, average of 3.5/spl times/) performance improvements over well-optimized conventional designs with small area overheads (up to 19.3%, 15.1% on average). At the same time, the reduction in the energy-delay product is by an average of 5.9/spl times/ (up to 11.0/spl times/).
机译:随着片上全球通信成本的不断增长,用于数据密集型应用的高性能设计需要在整个芯片上分配硬件资源(计算逻辑,存储器,互连等)的架构,同时将计算和通信限制在地理上。在本文中,我们提出了一种用于分布式逻辑内存架构的高级综合(HLS)的方法,即具有跨芯片中多个分区分布的逻辑和内存的架构。传统的HLS工具能够从架构的行为中提取并行性,这些架构假定与内存或内存层次结构进行通信的单片控制器/数据路径。本文提供了将综合领域扩展到更通用的体系结构的技术,这些体系结构可以以协同方式从数据访问和计算中提取粗粒度和细粒度并行性。我们的方法选择了许多可能的组织数据和计算的方式,仔细检查了在选择一种解决方案而不是另一种解决方案时的权衡(即通信开销,同步开销,区域开销),并将传统的HLS技术用于中间步骤。我们通过使用现有的商用HLS工具(具有和不具有我们的增强功能)生成寄存器传输级别(RTL)实施方案,并对生成的RTL电路进行逻辑综合和布局处理,从而在几个基准上评估了所提议的框架。结果表明,使用我们的框架设计为分布式逻辑内存架构的电路,与经过良好优化的常规设计相比,具有小面积开销(高达19.3%),可显着提高性能(最高5.3 / spl次/,平均3.5 / spl次/)。 %,平均为15.1%)。同时,能量延迟乘积的减少平均为5.9 / spl次/(最高为11.0 / spl次/)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号