An efficient parallel algorithm to compute template matching of an N$0N input image with an M*M template on a single-instruction multiple-data (SIMD) mesh-connected computer with P processors is proposed. The input image is mapped into the processor array such that each processor stores N/sup 2//P data in the cyclic mode. The template values are circulated among the processors instead of being broadcast or stored in the processor memory. There is no movement of the intermediate results. The computation and the communication time complexity of the algorithm is O(M/sup 2/N/sup 2//P) for all P in the range M/sup 2/>or=P>or=N/sup 2/.
展开▼