This paper introduces a new mechanism for the exposure of large grain parallelism. The scheme performs lazy task creation; inlining all tasks provisionally and extracting parallelism from the inlined information later on demand. However, unlike other mechanisms, the further task demand is satisfied by the next evaluation stream rather than retrospectively reversing the inlining decision of the current stream. The scheme is called lazy decomposition because decomposition itself is throttled rather than just the extraction of a task. Lazy decomposition makes the serial section clearly separated from the parallel section in an evaluation tree for a particular function, and this allows the serial section to adopt a sequential algorithm. The performance improvement is significant in divide-and-conquer applications by adoption of sequential algorithms.
展开▼