Current research is mainly focussing on exploiting TLP to increase performance. Another avenue, however, for achieving performance scalability is specialization. In this paper we propose application specific intra-vector instructions for two dimensional signal processing kernels. In such kernels usually significant data rearrangement overhead is required in order to use the SIMD capabilities. When using the intra-vector instructions the overhead can be avoided. We have implemented intra-vector instructions in the Cell SPU core and measured speedups of up to 2.06, with an average of 1.45.
展开▼