A reconfigurable processor architecture based on unique square recursive order decomposition of partial product matrices is described. This architecture can be easily reconfigured by taking advantage of the locality of data that is produced by the square recursive ordering to compute the inner products of input arrays with four or more options. Each input array may contain sixty-four 8-bit items or sixteen 16-bit items or four 32-bit items or one 64-bit item, with items in either 2's-complement or unsigned form. The processor can be pipelined to output an inner product in one machine cycle, and to complete an inner product evaluation in two to four cycles, which is particularly attractive to high-speed and efficient matrix multiplication applications. The processor consists mainly of an array of 8×8 or 4×4 simple multipliers and at least one adder array.
展开▼