首页> 外文会议>Proceedings of the twenty-third annual symposium on parallelism in algorithms and architectures >Work-stealing for Mixed-mode Parallelism by Deterministic Team-building
【24h】

Work-stealing for Mixed-mode Parallelism by Deterministic Team-building

机译:确定性团队建设来窃取混合模式并行性

获取原文
获取原文并翻译 | 示例

摘要

We show how to extend classical work-stealing to deal with tightly coupled data parallel tasks that can require any number of threads r ≥ 1 for their execution, and term this extension work-stealing with deterministic team-building. As threads become idle they attempt to join a team of threads designated for a task requiring r > 1 threads for its execution, alternatively to steal a task, requiring no central coordination. Team building and stealing are done according to a deterministic hierarchy and involve at most a logarithmic number of possibly randomized steal attempts. Threads attempting to join the team for a task requiring a large number of threads may help smaller teams while waiting for the large team to form. Once a team has been formed the threads can in close coordination execute the data parallel task. Implementation can be done with standard lock-free data structures, and takes only a single extra compare-and-swap (CAS) operation per thread to build a team. In the degenerate case where all tasks require only a single thread, the implementation coincides with a locality aware work-stealing implementation. Using a prototype C++ implementation of our extended work-stealing algorithm, a mixed-mode parallel Quicksort algorithm with a delta parallel partitioning step has been implemented. We compare our (improved) implementation of this algorithm on top of our extended work-stealing scheduler to a standard task-parallel implementation with this scheduler, and with Intel Cilk Plus and Threading Building Blocks. In addition, we also compare to the optimized parallel MCSTL Quicksort. Results are shown for a 32-core Intel Nehalem EX system and a 16-core Sun T2+ system supporting up to 128 hardware threads. The mixed-mode parallel algorithm performs consistently better than the fork-join implementation, often significantly.
机译:我们将展示如何扩展经典的工作窃取以处理紧密耦合的数据并行任务,这些任务可能需要任意数量的线程r≥1来执行,并用确定性的团队建设来进行这种扩展工作窃取。当线程变得空闲时,它们尝试加入为任务指定的线程组,该任务需要r> 1个线程来执行,或者窃取任务,而无需中央协调。团队建设和抢断是根据确定性层次进行的,最多涉及对数个可能的随机抢断尝试。尝试加入团队以完成需要大量线程的任务的线程可能会在等待大型团队形成时帮助较小的团队。一旦组建了团队,线程就可以紧密协调地执行数据并行任务。可以使用标准的无锁数据结构来实现,并且每个线程仅需执行一次额外的比较和交换(CAS)操作即可建立一个团队。在所有任务仅需要一个线程的简并情况下,该实现与本地感知的工作窃取实现是一致的。使用我们扩展的工作窃取算法的原型C ++实现,已实现了具有增量并行分区步骤的混合模式并行Quicksort算法。我们将在扩展的工作窃取调度程序之上的该算法的(改进的)实现与与此调度程序以及英特尔Cilk Plus和线程构建模块的标准任务并行实现进行比较。此外,我们还将比较与优化的并行MCSTL Quicksort。显示了支持多达128个硬件线程的32核Intel Nehalem EX系统和16核Sun T2 +系统的结果。混合模式并行算法的性能始终要比fork-join实现的性能更好,通常是明显的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号