Efficiency in ILP processing by using orthogonality

机译：使用正交性提高ILP处理效率

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

For the next generations of Processor-Arrays-on-Chip (e. g., coarse-grained reconfigurable or programmable arrays)-including more than 100s to 1000s of processing elements-it is very important to keep the on-chip configuration/instruction memories as small as possible. Hence, compilers must take into account the scarceness of available instruction memory and create the code as compact as possible [1]. However, Very Long Instruction Word (VLIW) processors have the well-known problem that compilers typically produce lengthy codes. A lot of unnecessary code is produced due to unused Functional Units (FUs) or repeating operations for single FUs in instruction sequences. Techniques like software pipelining can be used to improve the utilization of the FUs, yet with the risk of code explosion [2] due to the overlapped scheduling of multiple loop iterations or other control flow statements. This is, where our proposed Orthogonal Instruction Processing (OIP) architecture (see Fig. 1) shows benefits in reducing the code size of compute-intensive loop programs. The idea is, contrary to lightweight VLIW processors used in arrays like Tightly Coupled Processor Arrays (TCPAs) [4], to equip each FU with its own instruction memory, branch unit, and program counter, but still let the FUs share the register files as well as input and output signals. This enables a processor to orthogonally execute a loop program. Each FU can execute its own sub-program while exchanging data over the register files. The branch unit and its instruction format have to be slightly changed by introducing a counter to each instruction that determines how often the instruction is repeated until the specified branch is executed. This enables repeating instructions without repeating them in the code. Those kind of processors have to be carefully programmed, e. g., to not run into data dependency problems while optimizing throughput. For solving this resource-constrained modulo scheduling problem, we use techniques based on mixed integer linear programming [5], [3].

机译：对于下一代片上处理器阵列（例如，粗粒度可重配置或可编程阵列）-包括100到1000多个处理元件-保持片上配置/指令存储器的体积非常小非常重要尽可能。因此，编译器必须考虑可用指令存储器的稀缺性，并创建尽可能紧凑的代码[1]。但是，超长指令字（VLIW）处理器存在一个众所周知的问题，即编译器通常会产生冗长的代码。由于未使用的功能单元（FU）或指令序列中单个FU的重复操作，会产生许多不必要的代码。可以使用诸如软件流水线之类的技术来提高FU的利用率，但是由于多个循环迭代或其他控制流语句的重叠调度，存在代码爆炸的风险[2]。在这里，我们提出的正交指令处理（OIP）架构（参见图1）显示了减少计算密集型循环程序的代码大小的好处。这个想法与像紧密耦合处理器阵列（TCPA）[4]之类的阵列中使用的轻量级VLIW处理器相反，是为每个FU配备自己的指令存储器，分支单元和程序计数器，但仍然让FU共享寄存器文件以及输入和输出信号。这使处理器能够正交执行循环程序。每个FU可以执行自己的子程序，同时通过寄存器文件交换数据。分支单元及其指令格式必须通过在每个指令中引入一个计数器来稍微改变，该计数器确定指令重复执行的频率，直到执行指定的分支为止。这样可以重复执行指令，而无需在代码中重复执行。这类处理器必须经过仔细编程，例如。例如，在优化吞吐量时不会遇到数据依赖性问题。为了解决这个资源受限的模调度问题，我们使用基于混合整数线性规划的技术[5]，[3]。

著录项

来源
《International Conference on Application-specific Systems, Architectures and Processors》|2017年|207-207|共1页
会议地点
作者
Marcel Brand; Frank Hannig; Alexandru Tanase; Jürgen Teich;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Program processors; VLIW; Computer architecture; Processor scheduling; Registers; Array signal processing;

机译：程序处理器; VLIW;计算机体系结构;处理器调度;寄存器;阵列信号处理;

相似文献

外文文献
中文文献
专利

1. ESTIMATING DISTRIBUTED CODING EFFICIENCY IN ORTHOGONAL MODELS OF FACIAL PROCESSING [J] . PAUL A. WATTERS Journal of Integrative Neuroscience . 2003,第2期

机译：在正交加工模型中估计分布式编码效率
2. Compiler-Guided Parallelism Adaption Based on Application Partition for Power-Gated ILP Processor [J] . Yufeng Tong, Wei Zhang, Yung-Cheng Ma, Very Large Scale Integration (VLSI) Systems, IEEE Transactions on . 2017,第4期

机译：基于应用分区的功率门控ILP处理器的并行编译器自适应
3. An ILP solution to address code generation for embedded applications on digital signal processors [J] . Weijia Che Computing reviews . 2013,第9期

机译：一种ILP解决方案，用于解决数字信号处理器上嵌入式应用的代码生成问题
4. Efficiency in ILP processing by using orthogonality [C] . Marcel Brand, Frank Hannig, Alexandru Tanase, IEEE International Conference on Application-specific Systems, Architectures and Processors . 2017

机译：使用正交性，ILP处理的效率
5. Improving the speed vs. accuracy tradeoff for simulating shared-memory multiprocessors with ILP processors. [D] . Durbhakula, Suryanarayana N. Murthy. 1998

机译：改进速度与精度之间的权衡，以使用ILP处理器模拟共享内存多处理器。
6. An Application of the Orthogonal Matching Pursuit Algorithm in Space-Time Adaptive Processing [O] . Anna Ślesicka, Adam Kawalec 2020

机译：正交匹配追踪算法在时空自适应处理中的应用
7. Code Size Efficiency in Global Scheduling for ILP Processors [O] . 2008

机译：ILp处理器全局调度中的代码大小效率
8. Customized MVA Model for ILP Multiprocessors [R] . Sorin, D. J., Vernon, M. K., Pai, V. S., 1998

机译：用于ILp多处理器的定制mVa模型

Efficiency in ILP processing by using orthogonality

摘要

著录项

相似文献

相关主题

期刊订阅