Exploring data transfer and storage issues is crucial to efficiently map data intensive applications (e.g., multimedia) onto programmable processors. Code transformations are used to minimise main memory bus load and hence also power and system performance, However this typically incurs a considerable arithmetic overhead in the addressing and local control. For instance, memory optimising in-place and data-layout transformations add costly module and integer division operations to the initial addressing code. In this paper, we show how the cycle overhead can be almost completely removed. This is done according to a systematic methodology which is a combination of an algebraic transformation exploration approach for the (non)linear arithmetic on top of an efficient transformation technique for reducing the piece-wise linear indexing to linear pointer arithmetic. The approach is illustrated on a real-life medical application, using a variety of programmable processor architectures. Total gains in cycle count ranging between a factor 5 and 25 are obtained compared to conventional compilers.
展开▼