...
首页> 外文期刊>IEEE transactions on very large scale integration (VLSI) systems >Generation of Heterogeneous Distributed Architectures for Memory-Intensive Applications Through High-Level Synthesis
【24h】

Generation of Heterogeneous Distributed Architectures for Memory-Intensive Applications Through High-Level Synthesis

机译:通过高级综合生成用于内存密集型应用程序的异构分布式体系结构

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Memory-intensive applications present unique challenges to an application-specific integrated circuit (ASIC) designer in terms of the choice of memory organization, memory size requirements, bandwidth and access latencies, etc. The high potential of single-chip distributed logic-memory architectures in addressing many of these issues has been recognized in general- purpose computing, and more recently, in ASIC design. The high-level synthesis (HLS) techniques presented in this paper are motivated by the fact that many memory-intensive applications exhibit irregular array data access patterns. Synthesis should, therefore, be capable of determining a partitioned architecture, wherein array data and computations may have to be heterogeneously distributed for achieving the best performance speed-up. We use a combination of clustering and min-cut style partitioning techniques to yield distributed architectures, based on simulation profiling while considering various factors including data access locality, balanced workloads, inter-partition communication, etc. Our experiments with several benchmark applications show that the proposed techniques yielded two-way partitioned architectures that can achieve upto 2.1 $times$ (average of 1.9 $times$) performance speed-up over conventional HLS solutions, while achieving upto 1.5$times$ (average of 1.4$times$ ) performance speed-up over the best homogeneous partitioning solution feasible. At the same time, the reduction in the energy-delay product over conventional single-memory designs is upto 2.7 $times$ (average of 2.0 $times$). A larger amount of partitioning makes further system performance improvement achievab-le at the cost of chip area.
机译:存储器密集型应用程序在存储器组织的选择,存储器大小要求,带宽和访问等待时间等方面给专用集成电路(ASIC)设计人员带来了独特的挑战。单芯片分布式逻辑存储器体系结构的巨大潜力解决这些问题中的许多问题已在通用计算中得到认可,最近在ASIC设计中也得到认可。本文提出的高级综合(HLS)技术是受许多内存密集型应用程序显示不规则数组数据访问模式的事实所推动的。因此,综合应该能够确定分区的体系结构,其中阵列数据和计算可能必须异构分布以实现最佳性能加速。我们结合了聚类和最小剪切样式划分技术,以基于仿真分析的方式得出分布式架构,同时考虑了各种因素,包括数据访问位置,平衡的工作负载,分区间的通信等。我们在多个基准测试应用程序上进行的实验表明所提出的技术产生了一种双向分区架构,与传统的HLS解决方案相比,该架构可以实现高达2.1倍(平均1.9倍)的性能加速,同时达到1.5倍(平均1.4倍)的性能速度。可行的最佳同质分区解决方案。同时,与传统的单存储器设计相比,能源延迟产品的减少量高达2.7美元乘以(平均2.0美元乘以)。大量的分区使得以芯片面积为代价的系统性能进一步提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号