Declarative Tuning for Locality in Parallel Programs

机译：并行程序中的局部性的声明式调整

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Optimized placement of data and computation for locality is critical for improving performance and reducing energy consumption on modern computing systems. However, for most programming models, modifying data and computation placements typically requires rewriting large portions of the application, thereby posing a huge performance portability challenge in today's rapidly evolving architecture landscape. In this paper we present TunedCnC, a novel, declarative and flexible CnC tuning framework for controlling the spatial and temporal placement of data and computation by specifying hierarchical affinity groups and distribution functions. TunedCnC emphasizes a separation of concerns: the domain expert specifies a parallel application by defining data and control dependences, while the tuning expert specifies how the application should be executed on a given architecture - defining when and where for data and computation placement. The application remains unchanged when tuned for a different platform or towards different performance goals. We evaluate the utility of TunedCnC on several applications, and demonstrate that varying the tuning specification can have a significant impact on an application's performance. Our evaluation is performed using an implementation of the Concurrent Collections (CnC) declarative parallel programming model, but our results should be applicable to tuning of other data-flow task-parallel programming models as well.

机译：在本地优化数据和计算的位置对于提高性能和减少现代计算系统上的能耗至关重要。但是，对于大多数编程模型而言，修改数据和计算位置通常需要重写应用程序的大部分内容，从而在当今快速发展的架构环境中提出了巨大的性能可移植性挑战。在本文中，我们介绍了TunedCnC，这是一个新颖的，声明性的和灵活的CnC调整框架，用于通过指定层次关系组和分布函数来控制数据和计算的时空放置。 TunedCnC强调关注点分离：领域专家通过定义数据和控件依赖性来指定并行应用程序，而调优专家则指定应如何在给定体系结构上执行应用程序-定义何时何地放置数据和计算。当针对不同的平台或针对不同的性能目标进行调整时，该应用程序保持不变。我们评估了TunedCnC在多个应用程序上的效用，并证明了更改调优规范可能会对应用程序的性能产生重大影响。我们使用并行集合（CnC）声明式并行编程模型的实现来执行我们的评估，但是我们的结果也应该适用于其他数据流任务并行编程模型的调整。

著录项

来源
《International Conference on Parallel Processing》|2016年|452-457|共6页
会议地点
作者
Sanjay Chatterjee; Nick Vrvilo; Zoran Budimlic; Kathleen Knobe; Vivek Sarkar;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Tuning; Programming; Distribution functions; Runtime; Computational modeling; Parallel processing; Hardware;

机译：调整;编程;分布函数;运行时;计算建模;并行处理;硬件;

相似文献

外文文献
中文文献
专利

1. JavaSymphony: a new programming paradigm to control and synchronize locality, parallelism and load balancing for parallel and distributed computing [J] . Thomas Fahringer, Alexandru Jugravu Concurrency and Computation . 2005,第7a8期

机译：JavaSymphony：一种新的编程范例，用于控制和同步并行，分布式计算的局部性，并行性和负载平衡
2. Declarative Coordination of Graph-based Parallel Programs [J] . Cruz Flavio, Rocha Ricardo, Goldstein Seth Copen ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2016,第8期

机译：基于图的并行程序的声明式协调
3. Optimized Parallel Execution of Declarative Programs on Distributed Memory Multiprocessors [J] . 沈美明, 田新民, 等计算机科学技术学报：英文版 . 1993,第003期

机译：分布式内存多处理器中声明性程序的优化并行执行
4. Declarative Tuning for Locality in Parallel Programs [C] . Sanjay Chatterjee, Nick Vrvilo, Zoran Budimlic, International Conference on Parallel Processing . 2016

机译：并行程序中的局部性声明调整
5. Optimizing Parallel Programs Using Composable Locality Models. [D] . Luo, Hao. 2017

机译：使用可组合位置模型优化并行程序。
6. Parallels between Global Transcriptional Programs of Polarizing Caco-2 Intestinal Epithelial Cells In Vitro and Gene Expression Programs in Normal Colon and Colon Cancer [O] . Annika M. Sääf, Jennifer M. Halbleib, Xin Chen, 1888

机译：体外极化Caco-2肠上皮细胞的全球转录程序与正常结肠癌和结肠癌中的基因表达程序之间的平行性
7. Tuning task granularity and data locality of data parallel GPH programs [O] . Loidl Hans-Wolfgang, Trinder Philip William, Butz Carsten Horst 2001

机译：调整数据并行GPH程序的任务粒度和数据局部性

Declarative Tuning for Locality in Parallel Programs

摘要

著录项

相似文献

相关主题

期刊订阅