首页> 外文会议>IEEE International Parallel and Distributed Processing Symposium >General Purpose Task-Dependence Management Hardware for Task-Based Dataflow Programming Models
【24h】

General Purpose Task-Dependence Management Hardware for Task-Based Dataflow Programming Models

机译:基于任务的数据流编程模型的通用任务依赖管理硬件

获取原文

摘要

Task-based programming models such as OpenMP, IntelTBB and OmpSs offer the possibility of expressing dependences among tasks to drive their execution at runtime. Managing these dependences introduces noticeable overheads when targeting fine-grained tasks, diminishing the potential speedups or even introducing performance losses. To overcome this drawback, we present a general purpose hardware accelerator, Picos++, to manage the inter-task dependences efficiently in both time and energy. Our design also includes a novel nested task support. To this end, a new hardware/software co-design is presented to overcome the fact that nested tasks with dependences could result in system deadlocks due to the limited amount of resources in hardware task dependence managers. In this paper we describe a detailed implementation of this design and evaluate a parallel task-based programming model using Picos++ in a Linux embedded system with two ARM Cortex-A9 and a FPGA. The scalability and energy consumption of the real system implemented have been studied and compared against a software runtime. Even in a system limited to 2 threads, using Picos++ results in more than 1.8x speedup and 40% of energy savings in the most demanding parallelizations of real benchmarks. As a matter of fact, a hardware task dependence manager should be able to achieve much higher speedup and provide more energy savings with more threads.
机译:诸如OpenMP,IntelTBB和OmpSs之类的基于任务的编程模型提供了表达任务之间的依赖关系以在运行时驱动其执行的可能性。当针对细粒度的任务时,管理这些依赖性会带来明显的开销,从而减少潜在的加速甚至带来性能损失。为了克服这个缺点,我们提出了一种通用的硬件加速器Picos ++,可以在时间和精力上高效地管理任务间的依赖关系。我们的设计还包括新颖的嵌套任务支持。为此,提出了一种新的硬件/软件协同设计,以克服以下事实:由于硬件任务依赖管理器中的资源数量有限,具有依赖关系的嵌套任务可能导致系统死锁。在本文中,我们描述了此设计的详细实现,并在具有两个ARM Cortex-A9和FPGA的Linux嵌入式系统中使用Picos ++评估了基于任务的并行编程模型。已经研究了所实现的实际系统的可伸缩性和能耗,并将其与软件运行时进行了比较。即使在限于2个线程的系统中,使用Picos ++仍可以在最苛刻的真实基准测试并行化中实现1.8倍以上的速度提高和40%的能源节省。事实上,硬件任务依赖管理器应该能够实现更高的加速并通过更多线程来节省更多能源。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号