Architectural support for task scheduling: hardware scheduling for dataflow on NUMA systems

Khan Behram; Goodman Daniel; Khan Salman; Toms Will; Faraboschi Paolo; Lujan Mikel; Watson Ian

首页> 外文期刊>Journal of supercomputing >Architectural support for task scheduling: hardware scheduling for dataflow on NUMA systems

【24h】

Architectural support for task scheduling: hardware scheduling for dataflow on NUMA systems

机译：任务调度的体系结构支持：NUMA系统上数据流的硬件调度

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

To harness the compute resource of many-core system with tens to hundreds of cores, applications have to expose parallelism to the hardware. Researchers are aggressively looking for program execution models that make it easier to expose parallelism and use the available resources. One common approach is to decompose a program into parallel 'tasks' and allow an underlying system layer to schedule these tasks to different threads. Software-only schedulers can implement various scheduling policies and algorithms that match the characteristics of different applications and programming models. Unfortunately with large-scale multi-core systems, software schedulers suffer significant overheads as they synchronize and communicate task information over deep cache hierarchies. To reduce these overheads, hardware-only schedulers like Carbon have been proposed to enable task queuing and scheduling to be done in hardware. This paper presents a hardware scheduling approach where the structure provided to programs by task-based programming models can be incorporated into the scheduler, making it aware of a task's data requirements. This prior knowledge of a task's data requirements allows for better task placement by the scheduler which result in a reduction in overall cache misses and memory traffic, improving the program's performance and power utilization. Simulations of this technique for a range of synthetic benchmarks and components of real applications have shown a reduction in the number of cache misses by up to 72 and 95 % for the L1 and L2 caches, respectively, and up to 30 % improvement in overall execution time against FIFO scheduling. This results not only in faster execution and in less data transfer with reductions of up to 50 %, allowing for less load on the interconnect, but also in lower power consumption.

机译：为了利用具有数十到数百个内核的多核系统的计算资源，应用程序必须向硬件公开并行性。研究人员正在积极寻找程序执行模型，以使其更易于公开并行性和使用可用资源。一种常见的方法是将程序分解为并行的“任务”，并允许底层系统层将这些任务调度到不同的线程。纯软件调度程序可以实现各种调度策略和算法，以匹配不同应用程序和编程模型的特征。不幸的是，对于大型多核系统，软件调度程序在深层缓存层次结构上同步和传递任务信息时会遭受大量开销。为了减少这些开销，已经提出了诸如Carbon的仅硬件调度程序，以使任务排队和调度能够在硬件中完成。本文提出了一种硬件调度方法，其中可以将基于任务的编程模型提供给程序的结构合并到调度程序中，从而使其了解任务的数据要求。对任务数据要求的这种先验知识可以使调度程序更好地放置任务，从而减少总体缓存丢失和内存流量，从而提高程序的性能和功耗。对该技术在一系列综合基准和实际应用程序组件中的仿真表明，L1和L2缓存的缓存未命中数量分别减少了72％和95％，整体执行效率提高了30％ FIFO调度的时间。这不仅可以加快执行速度，减少数据传输，减少多达50％的空间，从而减少互连的负载，还可以降低功耗。

著录项

来源
《Journal of supercomputing》 |2015年第6期|2309-2338|共30页
作者
Khan Behram; Goodman Daniel; Khan Salman; Toms Will; Faraboschi Paolo; Lujan Mikel; Watson Ian;
展开▼
作者单位

BT Res, Ipswich, Suffolk, England;

Univ Manchester, Manchester, Lancs, England;

Solarflare Commun, Irvine, CA USA;

Univ Manchester, Manchester, Lancs, England;

HP Labs, Palo Alto, CA USA;

Univ Manchester, Manchester, Lancs, England;

Univ Manchester, Manchester, Lancs, England;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Scheduling; Hardware scheduling; Task-based application; Dataflow;

机译：调度;硬件调度;基于任务的应用程序;数据流;

相似文献

外文文献
中文文献
专利

1. Hardware Supported Task Scheduling on Dynamically Reconfigurable SoC Architectures [J] . Zexin Pan, Wells B.E. IEEE transactions on very large scale integration (VLSI) systems . 2008,第11期

机译：动态可重新配置SoC架构上硬件支持的任务计划
2. A Communication-aware Scheduling Algorithm for Hardware Task Scheduling Model on FPGA-based Reconfigurable Systems [J] . Yingying Sheng, Yan Liu, Renfa Li, Journal of Computers . 2014,第11期

机译：None
3. Kernel mechanisms with dynamic task-aware scheduling to reduce resource contention in NUMA multi-core systems [J] . Mei-Ling Chiang, Chieh-Jui Yang, Shu-Wei Tu The Journal of Systems and Software . 2016,第nova期

机译：具有动态任务感知调度的内核机制，可减少NUMA多核系统中的资源争用
4. Using Data Dependencies to Improve Task-Based Scheduling Strategies on NUMA Architectures [C] . Philippe Virouleau, Francois Broquedis, Thierry Gautier, International conference on parallel and distributed comuting . 2016

机译：使用数据依赖关系来改善NUMA架构上基于任务的计划策略
5. Hardware supported task scheduling on dynamically reconfigurable SoC architectures. [D] . Pan, Zexin. 2006

机译：在可动态重新配置的SoC架构上，硬件支持的任务调度。
6. Applying Dynamic Priority Scheduling Scheme to Static Systems of Pinwheel Task Model in Power-Aware Scheduling [O] . Ye-In Seol, Young-Kuk Kim -1

机译：动态优先级调度方案在动力感知型风车任务模型静态系统中的应用
7. Hardware Supported Task Scheduling on Dynamically Reconfigurable SoC Architectures [O] . Zexin Pan, B. Earl Wells 2014

机译：动态可重构soC架构上的硬件支持任务调度

Architectural support for task scheduling: hardware scheduling for dataflow on NUMA systems

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅