首页> 外文学位 >Mapping a Dataflow Programming Model onto Heterogeneous Architectures.
【24h】

Mapping a Dataflow Programming Model onto Heterogeneous Architectures.

机译:将数据流编程模型映射到异构体系结构。

获取原文
获取原文并翻译 | 示例

摘要

This thesis describes and evaluates how extending Intel's Concurrent Collections (CnC) programming model can address the problem of hybrid programming with high performance and low energy consumption, while retaining the ease of use of data-flow programming.;The CnC model is a declarative, dynamic light-weight task based parallel programming model and is implicitly deterministic by enforcing the single assignment rule. These properties ensure that problems are modelled in an intuitive way. CnC offers a separation of concerns by allowing algorithms to be expressed as a two stage process: first by decomposing a problem into components and specifying how components interact with each other, and second by providing an implementation for each component. By facilitating the separation between a domain expert, who can provide an accurate problem specification at a high level, and a tuning expert, who can tune the individual components for better performance, we ensure that tuning and future development, such as replacement of a subcomponent with a more efficient algorithm, become straightforward.;A recent trend in mainstream desktop systems is the use of graphics processor units (GPUs) to obtain order-of-magnitude performance improvements relative to general-purpose CPUs. In addition, the use of FPGAs has seen a significant increase for applications that can take advantage of such dedicated hardware. We see that computing is evolving from using many core CPUs to "co-processing" on the CPU, GPU and FPGA, however hybrid programming models that support the interaction between multiple heterogeneous components are not widely accessible to mainstream programmers and domain experts who have a real need for such resources.;We propose a C-based implementation of the CnC model for enabling parallelism across heterogeneous processor components in a flexible way, with high resource utilization and high programmability. We use the task-parallel HabaneroC language (HC) as the platform for implementing CnC-HabaneroC (CnC-HC), a language also used to implement the computation steps in CnC-HC, for interaction with GPU or FPGA steps and which offers the desired flexibility and extensibility of interacting with any other C based language.;First, we extend the CnC model with tag functions and ranges to enable automatic code generation of high level operations for inter-task communication. This improves programmability and also makes the code more analysable, opening the door for future optimizations. Secondly, we introduce a way to specify steps that are data parallel and thus are fit to execute on the GPU, and the notion of task affinity, a tuning annotation in the specification language. Affinity is used by the runtime during scheduling and can be fine-tuned based on application needs to achieve better (faster, lower power, etc.) results. Thirdly, we introduce and develop a novel, data-driven runtime for the CnC model, using HabaneroC (HC) as a base language. In addition, we also create an implementation of the previous runtime approach and conduct a study to compare the performance. Next, we expand the HabaneroC dynamic work-stealing runtime to allow cross-device stealing based on task affinity. Cross-device dynamic work-stealing is used to achieve load balancing across heterogeneous platforms for improved performance. Finally, we implement and use a series of benchmarks for testing the model in different scenarios and show that our proposed approach can yield significant performance benefits and low power usage when using a hybrid execution.
机译:本文描述并评估了扩展英特尔®并发集合(CnC)编程模型如何解决高性能,低能耗的混合编程问题,同时又保持了数据流编程的易用性。基于动态轻量级任务的并行编程模型,并且通过执行单个分配规则来隐式确定性。这些属性确保以直观的方式对问题进行建模。 CnC通过将算法表示为两个阶段的过程来提供关注点的分离:首先通过将问题分解为组件并指定组件之间的交互方式,其次通过为每个组件提供实现。通过促进可以提供高水平准确问题说明的领域专家和可以调优各个组件以提高性能的调优专家之间的分离,我们确保调优和未来的发展,例如更换子组件使用更有效的算法,变得更简单。主流台式机系统中的最新趋势是使用图形处理器单元(GPU)获得相对于通用CPU的数量级性能改进。此外,对于可以利用这种专用硬件的应用,FPGA的使用已大大增加。我们看到计算正在从使用许多核心CPU演变为在CPU,GPU和FPGA上进行“协同处理”,但是,支持具有多个异构组件之间交互作用的混合编程模型对于具有丰富的开发经验的主流程序员和领域专家而言,并不广泛。我们真正提出了对此类资源的需求。我们提出了一种基于C的CnC模型实现,以灵活的方式实现异构处理器组件之间的并行化,并具有较高的资源利用率和可编程性。我们使用任务并行的HabaneroC语言(HC)作为实现CnC-HabaneroC(CnC-HC)的平台,该语言也用于在CnC-HC中实现计算步骤,用于与GPU或FPGA步骤进行交互,并提供与任何其他基于C的语言交互所需的灵活性和可扩展性。首先,我们使用标签功能和范围扩展CnC模型,以实现为任务间通信自动生成高级操作的代码。这提高了可编程性,还使代码更易于分析,为以后的优化打开了方便之门。其次,我们介绍一种方法来指定与数据并行的步骤,因此适合在GPU上执行;还介绍了任务亲和力的概念,即规范语言中的调整注释。运行时在调度期间使用亲和力,可以根据应用程序需求对其进行微调,以获得更好的(更快,更低功耗等)结果。第三,我们使用HabaneroC(HC)作为基本语言,为CnC模型引入并开发了一种新颖的,数据驱动的运行时。此外,我们还创建了先前运行时方法的实现,并进行了研究以比较性能。接下来,我们扩展HabaneroC动态工作窃取运行时,以允许基于任务关联性进行跨设备窃取。跨设备动态工作窃取用于实现异构平台之间的负载平衡,以提高性能。最后,我们实现并使用了一系列基准测试,以在不同情况下测试模型,并表明我们提出的方法在使用混合执行时可以产生显着的性能优势和低功耗。

著录项

  • 作者

    Sbirlea, Alina Gabriela.;

  • 作者单位

    Rice University.;

  • 授予单位 Rice University.;
  • 学科 Computer Science.
  • 学位 M.S.
  • 年度 2012
  • 页码 108 p.
  • 总页数 108
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号