Loop fusion for clustered VLIW architectures

机译：集群VLIW架构的循环融合

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Embedded systems require maximum performance from a processor within significant constraints in power consumption and chip cost. Using software pipelining, high-performance digital signal processors can often exploit considerable instruction-level parallelism (ILP), and thus significantly improve performance. However, software pipelining, in some instances, hinders the goals of low power consumption and low chip cost. Specifically, the registers required by a software pipelined loop may exceed the size of the physical register set.The register pressure problem incurred by software pipelining makes it difficult to build a high-performance embedded processor with a single, multi-ported register bank with enough registers to support high levels of ILP while maintaining clock speed and limiting power consumption. The large number of ports required to support a single register bank severely hampers access time. The port requirement for a register bank can be reduced via hardware by partitioning the register bank into multiple banks connected to disjoint subsets of functional units, called clusters. Since a functional unit is not directly connected to all register banks, wasted energy and resources can result due to delays incurred when accessing "non-local" registers.The overhead due to partitioning of the register set can be ameliorated by using high-level compiler loop optimization techniques such as unrolling, unroll-and-jam and fusion. High-level loop optimizations spread data-independent parallelism across clusters that may not require "non-local" register accesses and can provide work to hide the latency of any such register accesses that are needed.In this paper, we examine the effects of loop fusion on DSP loops run on four simulated, clustered VLIW architectures and the Texas Instruments TMS320C64x. Our experiments show a 1.3 -- 2 harmonic mean speedup.

机译：嵌入式系统要求处理器在最大程度地限制功耗和芯片成本的前提下实现最高性能。使用软件流水线，高性能数字信号处理器通常可以利用可观的指令级并行性（ILP），从而显着提高性能。然而，在某些情况下，软件流水线阻碍了低功耗和低芯片成本的目标。具体来说，软件流水线循环所需的寄存器可能会超过物理寄存器集的大小。软件流水线导致的寄存器压力问题使得难以使用单个多端口寄存器组构建具有足够功能的高性能嵌入式处理器寄存器以支持高级别的ILP，同时保持时钟速度并限制功耗。支持单个寄存器组所需的大量端口严重影响了访问时间。可以通过硬件将寄存器组划分为多个与功能单元的不相连子集（称为簇”连接的组）来通过硬件降低寄存器组的端口要求。由于功能单元未直接连接到所有寄存器组，因此访问“非本地”寄存器时可能会由于延迟而导致能源和资源浪费。可以通过使用高级编译器来缓解因寄存器集分区而导致的开销。循环优化技术，例如展开，展开和卡塞以及融合。高级循环优化可以在不需要“非本地”寄存器访问的群集中分布与数据无关的并行性，并且可以提供隐藏所需的任何此类寄存器访问延迟的工作。在本文中，我们研究了循环的影响DSP循环上的融合在四种模拟的群集VLIW架构和Texas Instruments TMS320C64x上运行。我们的实验显示平均谐波加速1.3-2。 展开▼

著录项

来源
《Proceedings of the joint conference on Languages, compilers and tools for embedded systems》|2002年|P.112-119|共8页

会议地点 Berlin(DE)

作者
Yi Qian; Steve Carr; Philip Sweany;
展开▼

作者单位

Michigan Technological University, Houghton MI;

Texas Instruments, Dallas, TX;

展开▼

会议组织

原文格式 PDF

正文语种 eng

中图分类计算技术、计算机技术;

关键词
loop fusion;

机译：循环融合;

相似文献

外文文献

中文文献

专利

1. Improving Performance of Loops on DIAM-based VLIW Architectures [J] . Jinyong Lee, Jongwon Lee, Yunheung Paek, ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2014,第5期

机译：在基于DIAM的VLIW架构上提高循环性能

2. UFS: a global trade-off strategy for loop unrolling for VLIW architectures [J] . K. Heydemann, F. Bodin, P. M. W. Knijnenburg, Concurrency and Computation . 2006,第11期

机译：UFS：针对VLIW架构的循环展开的全球权衡策略

3. Software Pipelining Irregular Loops On the TMS320C6000 VLIW DSP Architecture [J] . Elana Granston, Eric Stotzer, Joe Zbiciak ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2001,第8期

机译：TMS320C6000 VLIW DSP架构上的软件流水管理不规则循环

4. Loop fusion for clustered VLIW architectures [C] . Yi Qian, Steve Carr, Philip Sweany Joint conference on Languages, compilers and tools for embedded systems . 2002

机译：群集VLIW架构的循环融合

5. Loop transformations for clustered VLIW architectures. [D] . Qian, Yi. 2002

机译：集群VLIW架构的循环转换。

6. Optimizing Instruction Scheduling and Register Allocation for Register-File-Connected Clustered VLIW Architectures [O] . Haijing Tang, Xu Yang, Siye Wang, 2013

机译：连接寄存器文件的集群式VLIW架构的优化指令调度和寄存器分配

7. Loop Fusion for Clustered VLIW Architectures [O] . Yi Qian Science, Yi Qian 2002

机译：集群VLIW架构的循环融合

1. 西部绿色增长极的产业架构——《循环产业集群——西部地区生态化发展的新型产业组织模式》评介 [J] . 孙刚 . 东北财经大学学报 . 2011,第003期

2. 一种支持Superscalar-VLIW混合架构处理器的混合分支预测设计 [J] . 付家为 ,王旭 ,何虎 . 计算机应用与软件 . 2017,第002期

3. 一种基于VLIW架构的高效DCT实现方法 [J] . 周晶 ,张涛 ,孙张明 . 信息技术 . 2015,第011期

4. 面向 Superscalar与 VLIW 混合架构处理器的调试器设计 [J] . 杨群 ,李笑天 ,何虎 . 计算机应用与软件 . 2015,第005期

5. VLIW架构处理器软件模拟器设计 [J] . 黄光红 ,王昊 . 电脑知识与技术 . 2014,第018期

6. 推拉理论视角下外来人力资本与产业集群的本土相融性研究——以浙商与江苏产业集群融合为例 [C] . 王芳芳 ,金刚 . 第十二届产业集群与区域发展学术会议 . 2013

7. 产业集群创新网络架构与竞争力的动态实现--宁波塑机产业集群发展研究 [A] . 曹玉廷 . 2010

1. 一种基于超融合架构防DDoS攻击方法、装置及超融合集群 [P] . 中国专利： CN112165495A . 2021-01-01

2. 一种基于超融合基础架构的集群构建方法和装置 [P] . 中国专利： CN113626183A . 2021-11-09

3. Clustered architecture in a VLIW processor [P] . 外国专利： US6615338B1 . 2003-09-02

机译：VLIW处理器中的集群架构

4. CLUSTERED ARCHITECTURE IN A VLIW PROCESSOR [P] . 外国专利： WO0033176A9 . 2002-08-22

机译：VLIW处理器中的集群架构

5. CLUSTERED ARCHITECTURE IN A VLIW PROCESSOR [P] . 外国专利： WO0033176B1 . 2001-03-15

机译：VLIW处理器中的集群架构

相关主题

Loop fusion for clustered VLIW architectures

摘要

著录项

相似文献

相关主题

期刊订阅