首页> 外文会议>Joint conference on Languages, compilers and tools for embedded systems >Loop fusion for clustered VLIW architectures

【24h】

Loop fusion for clustered VLIW architectures

机译：群集VLIW架构的循环融合

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Embedded systems require maximum performance from a processor within significant constraints in power consumption and chip cost. Using software pipelining, high-performance digital signal processors can often exploit considerable instruction-level parallelism (ILP), and thus significantly improve performance. However, software pipelining, in some instances, hinders the goals of low power consumption and low chip cost. Specifically, the registers required by a software pipelined loop may exceed the size of the physical register set.The register pressure problem incurred by software pipelining makes it difficult to build a high-performance embedded processor with a single, multi-ported register bank with enough registers to support high levels of ILP while maintaining clock speed and limiting power consumption. The large number of ports required to support a single register bank severely hampers access time. The port requirement for a register bank can be reduced via hardware by partitioning the register bank into multiple banks connected to disjoint subsets of functional units, called clusters. Since a functional unit is not directly connected to all register banks, wasted energy and resources can result due to delays incurred when accessing "non-local" registers.The overhead due to partitioning of the register set can be ameliorated by using high-level compiler loop optimization techniques such as unrolling, unroll-and-jam and fusion. High-level loop optimizations spread data-independent parallelism across clusters that may not require "non-local" register accesses and can provide work to hide the latency of any such register accesses that are needed.In this paper, we examine the effects of loop fusion on DSP loops run on four simulated, clustered VLIW architectures and the Texas Instruments TMS320C64x. Our experiments show a 1.3 -- 2 harmonic mean speedup.

机译：嵌入式系统需要从处理器的最大性能，以在功耗和芯片成本的显着限制内。使用软件流水线，高性能数字信号处理器通常可以利用相当大的指令级并行性（ILP），从而显着提高性能。然而，在某些情况下，软件流水线阻碍了低功耗和低芯片成本的目标。具体地，软件流水线循环所需的寄存器可能超过物理寄存器集的大小。软件流水线产生的寄存器压力问题使得难以构建具有足够的单个多端寄存器库的高性能嵌入式处理器寄存器支持高水平的ILP，同时保持时钟速度和限制功耗。支持单个寄存器银行的大量端口严重妨碍访问时间。可以通过将寄存器组分区到连接到功能单元的辅助子集的多个银行中，通过硬件减少寄存器库的端口要求，称为群集。由于功能单元未直接连接到所有寄存器库，因此由于访问“非本地”寄存器时产生的延迟而导致浪费的能量和资源可以导致。由于使用高级编译器可以改善引起的寄存器集的分区引起的开销循环优化技术，如展开，展开和滤饼和融合。高级循环优化在可能不需要“非本地”寄存器访问的集群中扩展数据无关的并行性，并且可以提供用于隐藏所需任何此类寄存器访问的延迟的工作。在本文中，我们检查循环的效果DSP循环融合在四个模拟，集群的VLIW架构和Texas Instruments TMS320C64x上运行。我们的实验表明了1.3 - 2次谐波平均加速。

著录项

来源
《Joint conference on Languages, compilers and tools for embedded systems 》|2002年||共8页
会议地点
作者
Yi Qian; Steve Carr; Philip Sweany;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算机软件 ;
关键词
loop fusion;

机译：环融合;

相似文献

外文文献
中文文献
专利

1. Improving Performance of Loops on DIAM-based VLIW Architectures [J] . Jinyong Lee, Jongwon Lee, Yunheung Paek, ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2014 ,第5期

机译：在基于DIAM的VLIW架构上提高循环性能
2. UFS: a global trade-off strategy for loop unrolling for VLIW architectures [J] . K. Heydemann, F. Bodin, P. M. W. Knijnenburg, Concurrency and Computation . 2006 ,第11期

机译：UFS：针对VLIW架构的循环展开的全球权衡策略
3. Software Pipelining Irregular Loops On the TMS320C6000 VLIW DSP Architecture [J] . Elana Granston, Eric Stotzer, Joe Zbiciak ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2001 ,第8期

机译：TMS320C6000 VLIW DSP架构上的软件流水管理不规则循环
4. Loop fusion for clustered VLIW architectures [C] . Yi Qian, Steve Carr, Philip Sweany Proceedings of the joint conference on Languages, compilers and tools for embedded systems . 2002

机译：集群VLIW架构的循环融合
5. Loop transformations for clustered VLIW architectures. [D] . Qian, Yi. 2002

机译：集群VLIW架构的循环转换。
6. Optimizing Instruction Scheduling and Register Allocation for Register-File-Connected Clustered VLIW Architectures [O] . Haijing Tang, Xu Yang, Siye Wang, 2013

机译：连接寄存器文件的集群式VLIW架构的优化指令调度和寄存器分配
7. Loop Fusion for Clustered VLIW Architectures [O] . Yi Qian Science, Yi Qian 2002

机译：集群VLIW架构的循环融合

Loop fusion for clustered VLIW architectures

摘要

著录项

相似文献

相关主题

期刊订阅