首页> 外文OA文献 >Mapping a Dataflow Programming Model onto Heterogeneous Architectures
【2h】

Mapping a Dataflow Programming Model onto Heterogeneous Architectures

机译:将数据流编程模型映射到异构体系结构

摘要

This thesis describes and evaluates how extending Intel's Concurrent Collections (CnC) programming model can address the problem of hybrid programming with high performance and low energy consumption, while retaining the ease of use of data-flowprogramming. The CnC model is a declarative, dynamic light-weight task based parallel programming model and is implicitly deterministic by enforcing the single assignment rule, properties which ensure that problems are modelled in an intuitive way.CnC offers a separation of concerns by allowing algorithms to be expressed as a two stage process: first by decomposing a problem into components and specifying how components interact with each other, and second by providing an implementation for each component. By facilitating the separation between a domain expert, who can provide an accurate problem specification at a high level, and a tuning expert, who can tune the individual components for better performance, we ensure that tuning and future development, such as replacement of a subcomponent with a more efficient algorithm, become straightforward.A recent trend in mainstream desktop systems is the use of graphics processor units (GPUs) to obtain order-of-magnitude performance improvements relative to general-purpose CPUs. In addition, the use of FPGAs has seen a significant increase for applications that can take advantage of such dedicated hardware. We see that computing is evolving from using many core CPUs to ``co-processing" on the CPU, GPU and FPGA, however hybrid programming models that support the interaction between multiple heterogeneous components are not widely accessible to mainstream programmers and domain experts who have a real need for such resources.We propose a C-based implementation of the CnC model for enabling parallelism across heterogeneous processor components in a flexible way, with high resource utilization and high programmability. We use the task-parallel HabaneroC language (HC) as the platform for implementing CnC-HabaneroC (CnC-HC), a language also used to implement the computation steps in CnC-HC, for interaction with GPU or FPGA steps and which offers the desired flexibility and extensibility of interacting with any other C based language.First, we extend the CnC model with tag functions and ranges to enable automatic code generation of high level operations for inter-task communication. This improves programmability and also makes the code more analysable, opening the door for future optimizations.Secondly, we introduce a way to specify steps that are data parallel and thus are fit to execute on the GPU, and the notion of task affinity, a tuning annotation in the specification language. Affinity is used by the runtime during scheduling and can be fine-tuned based on application needs to achieve better (faster, lower power, etc.) results.Thirdly, we introduce and develop a novel, data-driven runtime for the CnC model, using HabaneroC (HC) as a base language. In addition, we also create an implementation of the previous runtime approach and conduct a study to compare the performance.Next, we expand the HabaneroC dynamic work-stealing runtime to allow cross-device stealing based on task affinity. Cross-device dynamic work-stealing is used to achieve load balancing across heterogeneous platforms for improved performance.Finally, we implement and use a series of benchmarks for testing the model in different scenarios and show that our proposed approach can yield significant performance benefits and low power usage when using a hybrid execution.
机译:本文描述并评估了扩展英特尔并发集合(CnC)编程模型如何解决高性能,低能耗的混合编程问题,同时又保持了数据流编程的易用性。 CnC模型是一种声明式,基于动态轻量级任务的并行编程模型,并且通过强制执行单个分配规则来隐式确定性,这些属性确保以直观的方式对问题进行建模.CnC通过允许算法可以将关注点分离表示为两个阶段的过程:首先通过将问题分解为组件并指定组件之间的交互方式,其次通过为每个组件提供实现。通过促进可以提供高水平准确问题说明的领域专家和可以调优各个组件以提高性能的调优专家之间的分离,我们确保调优和未来的发展,例如更换子组件主流台式机系统的最新趋势是,使用图形处理器单元(GPU)获得相对于通用CPU的数量级性能改进。此外,对于可以利用这种专用硬件的应用,FPGA的使用已大大增加。我们看到计算正在从使用许多核心CPU演变为在CPU,GPU和FPGA上进行``协同处理'',但是支持多种异构组件之间交互的混合编程模型对于具有我们提出了一种基于C的CnC模型实现,以灵活的方式实现异构处理器组件之间的并行化,并具有较高的资源利用率和可编程性,我们使用任务并行的HabaneroC语言(HC)作为用于实现CnC-HabaneroC(CnC-HC)的平台,该语言也用于在CnC-HC中实现计算步骤,用于与GPU或FPGA步骤进行交互,并且提供了与任何其他基于C的语言进行交互所需的灵活性和可扩展性首先,我们使用标签功能和范围扩展CnC模型,以实现任务间通信的高级操作的自动代码生成。的可编程性,还使代码更易于分析,这为将来的优化打开了大门。其次,我们引入了一种方法来指定与数据并行且适合在GPU上执行的步骤,以及任务亲和力的概念(调整注释)在规范语言中。关联性在调度过程中由运行时使用,可以根据应用程序的需求进行微调,以实现更好的(更快,更低的功耗等)结果。第三,我们为CnC模型引入和开发一种新颖的,数据驱动的运行时,使用HabaneroC(HC)作为基本语言。此外,我们还创建了先前运行时方法的实现并进行了研究以比较性能。接下来,我们扩展了HabaneroC动态工作窃取运行时,以允许基于任务亲缘关系进行跨设备窃取。跨设备动态工作窃取可用于在异构平台上实现负载平衡以提高性能。最后,我们实现并使用了一系列基准测试不同场景下的模型,并表明我们提出的方法可以产生显着的性能优势,并且降低成本。使用混合执行时的功耗。

著录项

  • 作者

    Sbirlea Alina;

  • 作者单位
  • 年度 2012
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号