Heterogeneous architectures, where a multicore processor, which is optimized for fast single-thread performance, is accompanied with a large number of simpler, but more power-efficient cores optimized for parallel workloads, such as NVIDIA's GPUs or Intel's Many Integrated Core (MIC), have been receiving a lot attention recently. Although NVIDIA's GPUs include built-in support for parallelism control, the MIC uses classical software thread creation and scheduling done by the operating system (OS). While efficient thread creation is desired in such many-core environments, current OS APIs provide the facility of creating only one thread at a time. In this paper, we propose a new system call for parallel thread creation on many-core coprocessors and show that it can perform up to 6.9 times better than the sequential version when executed on Intel's MIC software development platform.
展开▼