A common approach to enhance the performance of processors is to increase the number of function units which operate concurrently. We observe this development in all recent superscalar and VLIW (very-long instruction word) processors. VLIWs are easier extensible to high performance ranges because they lack much of the superscalar hardware required for dependence checking and hardware resource allocation; instead they rely on a compiler to perform these tasks. In this paper, we proceed along this line and go one step further in replacing hardware by software complexity: a new architecture is proposed which requires the scheduling and allocation of transports at compile-time, instead of performing this at run-time. This reduces hardware complexity and creates several new compile-time optimizations. The paper illustrates the compilation steps required, explains the concept and characteristics of the proposed architecture, and shows several measurements which confirm our belief that, especially for high-performance embedded applications, this architecture is very attractive.
展开▼