Techniques for task processing using a highly parallel processing architecture with a shallow pipeline are disclosed. A two-dimensional array of compute elements is accessed. Each compute element within the array of compute elements is known to a compiler and is coupled to its neighboring compute elements within the array of compute elements. Control for the array of compute elements is provided on a cycle-by-cycle basis. The control is enabled by a stream of wide, variable length, microcode control words generated by the compiler. Relevant portions of the control word are stored within a cache associated with the array of compute elements. The control words are decompressed. The decompressing occurs cycle-by-cycle out of the cache over multiple cycles. A compiled task is executed on the array of compute elements, based on the decompressing. Simultaneous execution of two or more potential compiled task outcomes is provided.
展开▼