【24h】

Efficiency in ILP processing by using orthogonality

机译:使用正交性提高ILP处理效率

获取原文

摘要

For the next generations of Processor-Arrays-on-Chip (e. g., coarse-grained reconfigurable or programmable arrays)-including more than 100s to 1000s of processing elements-it is very important to keep the on-chip configuration/instruction memories as small as possible. Hence, compilers must take into account the scarceness of available instruction memory and create the code as compact as possible [1]. However, Very Long Instruction Word (VLIW) processors have the well-known problem that compilers typically produce lengthy codes. A lot of unnecessary code is produced due to unused Functional Units (FUs) or repeating operations for single FUs in instruction sequences. Techniques like software pipelining can be used to improve the utilization of the FUs, yet with the risk of code explosion [2] due to the overlapped scheduling of multiple loop iterations or other control flow statements. This is, where our proposed Orthogonal Instruction Processing (OIP) architecture (see Fig. 1) shows benefits in reducing the code size of compute-intensive loop programs. The idea is, contrary to lightweight VLIW processors used in arrays like Tightly Coupled Processor Arrays (TCPAs) [4], to equip each FU with its own instruction memory, branch unit, and program counter, but still let the FUs share the register files as well as input and output signals. This enables a processor to orthogonally execute a loop program. Each FU can execute its own sub-program while exchanging data over the register files. The branch unit and its instruction format have to be slightly changed by introducing a counter to each instruction that determines how often the instruction is repeated until the specified branch is executed. This enables repeating instructions without repeating them in the code. Those kind of processors have to be carefully programmed, e. g., to not run into data dependency problems while optimizing throughput. For solving this resource-constrained modulo scheduling problem, we use techniques based on mixed integer linear programming [5], [3].
机译:对于下一代片上处理器阵列(例如,粗粒度可重配置或可编程阵列)-包括100到1000多个处理元件-保持片上配置/指令存储器的体积非常小非常重要尽可能。因此,编译器必须考虑可用指令存储器的稀缺性,并创建尽可能紧凑的代码[1]。但是,超长指令字(VLIW)处理器存在一个众所周知的问题,即编译器通常会产生冗长的代码。由于未使用的功能单元(FU)或指令序列中单个FU的重复操作,会产生许多不必要的代码。可以使用诸如软件流水线之类的技术来提高FU的利用率,但是由于多个循环迭代或其他控制流语句的重叠调度,存在代码爆炸的风险[2]。在这里,我们提出的正交指令处理(OIP)架构(参见图1)显示了减少计算密集型循环程序的代码大小的好处。这个想法与像紧密耦合处理器阵列(TCPA)[4]之类的阵列中使用的轻量级VLIW处理器相反,是为每个FU配备自己的指令存储器,分支单元和程序计数器,但仍然让FU共享寄存器文件以及输入和输出信号。这使处理器能够正交执行循环程序。每个FU可以执行自己的子程序,同时通过寄存器文件交换数据。分支单元及其指令格式必须通过在每个指令中引入一个计数器来稍微改变,该计数器确定指令重复执行的频率,直到执行指定的分支为止。这样可以重复执行指令,而无需在代码中重复执行。这类处理器必须经过仔细编程,例如。例如,在优化吞吐量时不会遇到数据依赖性问题。为了解决这个资源受限的模调度问题,我们使用基于混合整数线性规划的技术[5],[3]。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号