Consider an /spl radic/n/spl times//spl radic/n processor mesh where, in addition to the local links, each row and column is enhanced by a COMMON CRCW bus. Assume that each processor stores an element of a commutative semigroup, and only kn entries (in arbitrary positions) are nonzero. We wish to compute the sum of all entries. For this problem we easily obtain a lower time bound of /spl Omega/(k/sup 1/4 /) if k/spl les/n/sup 2/3/. Our main result is an O(k/sup 1/4 /log/sup 2/k) time algorithm. It requires a composition of several data movement and compaction techniques which seem to be of general use for solving problems with sparse inputs scattered on the mesh, as it is typical e.g. for primal sketches in digital image processing.
展开▼