首页> 外文会议>Annual symposium on parallelism in algorithms and architectures >Work-stealing for Mixed-mode Parallelism by Deterministic Team-building
【24h】

Work-stealing for Mixed-mode Parallelism by Deterministic Team-building

机译:确定性团队建设的混合模式并行窃取

获取原文

摘要

We show how to extend classical work-stealing to deal with tightly coupled data parallel tasks that can require any number of threads r ≥ 1 for their execution, and term this extension work-stealing with deterministic team-building. As threads become idle they attempt to join a team of threads designated for a task requiring r > 1 threads for its execution, alternatively to steal a task, requiring no central coordination. Team building and stealing are done according to a deterministic hierarchy and involve at most a logarithmic number of possibly randomized steal attempts. Threads attempting to join the team for a task requiring a large number of threads may help smaller teams while waiting for the large team to form. Once a team has been formed the threads can in close coordination execute the data parallel task. Implementation can be done with standard lock-free data structures, and takes only a single extra compare-and-swap (CAS) operation per thread to build a team. In the degenerate case where all tasks require only a single thread, the implementation coincides with a locality aware work-stealing implementation. Using a prototype C++ implementation of our extended work-stealing algorithm, a mixed-mode parallel Quicksort algorithm with a delta parallel partitioning step has been implemented. We compare our (improved) implementation of this algorithm on top of our extended work-stealing scheduler to a standard task-parallel implementation with this scheduler, and with Intel Cilk Plus and Threading Building Blocks. In addition, we also compare to the optimized parallel MCSTL Quicksort. Results are shown for a 32-core Intel Nehalem EX system and a 16-core Sun T2+ system supporting up to 128 hardware threads. The mixed-mode parallel algorithm performs consistently better than the fork-join implementation, often significantly.
机译:我们展示了如何扩展经典的工作窃取,以处理紧密耦合的数据并行任务,这些任务可能需要任何数量的线程r≥1,以及使用确定性团队建设的术语术语术语。由于线程变得空闲,他们试图加入指定为需要R> 1线程的任务的线程团队,以便窃取任务,要求没有中央协调。队伍建设和窃取是根据一个确定性层次结构完成的,并涉及最多的对数数量的随机窃取尝试。尝试加入团队的线程用于需要大量线程的任务可能在等待大型团队形成的同时有助于较小的团队。一旦组成了一个团队,线程可以在密切协调中执行数据并行任务。可以使用标准锁定数据结构完成实现,并且仅占每个线程的单个额外比较和交换(CAS)操作来构建团队。在所有任务只需要单个线程的退化情况下,实现与局域意识的工作窃取实现一致。使用我们扩展的工作窃取算法的原型C ++实现,已经实现了具有Delta并行分区步骤的混合模式并行Quicksort算法。我们将我们的(改进的)实施该算法在我们的扩展工作窃取调度程序之上进行了标准任务并行实现,并使用Intel Cilk Plus和Threading构件块。此外,我们还与优化的并行MCSTL Quicksort进行了比较。结果显示为32芯英特尔Nehalem EX系统和一个高达128个硬件线的16核太阳T2 +系统。混合模式并行算法始终如一地执行始终如一地优于叉协议实现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号