Scheduling Irregular Parallel Computations on Hierarchical Caches

机译：在分层缓存上调度不规则并行计算

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

For nested-parallel computations with low depth (span, critical path length) analyzing the work, depth, and sequential cache complexity suffices to attain reasonably strong bounds on the parallel runtime and cache complexity on machine models with either shared or private caches. These bounds, however, do not extend to general hierarchical caches, due to limitations in (i) the cache-oblivious (CO) model used to analyze cache complexity and (ii) the schedulers used to map computation tasks to processors. This paper presents the parallel cache-oblivious (PCO) model, a relatively simple modification to the CO model that can be used to account for costs on a broad range of cache hierarchies. The first change is to avoid capturing artificial data sharing among parallel threads, and the second is to account for parallelism-memory imbalances within tasks. Despite the more restrictive nature of PCO compared to CO, many algorithms have the same asymptotic cache complexity bounds. The paper then describes a new scheduler for hierarchical caches, which extends recent work on "space-bounded schedulers" to allow for computations with arbitrary work imbalance among parallel subtasks. This scheduler attains provably good cache performance and runtime on parallel machine models with hierarchical caches, for nested-parallel computations analyzed using the PCO model. We show that under reasonable assumptions our scheduler is "work efficient" in the sense that the cost of the cache misses are evenly balanced across the processors-i.e., the runtime can be determined within a constant factor by taking the total cost of the cache misses analyzed for a computation and dividing it by the number of processors. In contrast, to further support our model, we show that no scheduler can achieve such bounds (optimizing for both cache misses and runtime) if work, depth, and sequential cache complexity are the only parameters used to analyze a computation.

机译：对于低深度（跨度，关键路径长度）的嵌套并行计算，分析工作，深度和顺序缓存的复杂度足以在并行运行时和共享或专用缓存的机器模型上达到相当强的界限。但是，由于（i）用于分析高速缓存复杂性的超高速缓存（CO）模型和（ii）用于将计算任务映射到处理器的调度程序的局限性，这些限制并未扩展到一般的分层高速缓存。本文介绍了并行缓存忽略（PCO）模型，它是对CO模型的一种相对简单的修改，可以用来考虑各种缓存层次结构上的成本。第一个变化是避免捕获并行线程之间的人工数据共享，第二个变化是考虑任务中的并行内存不平衡。尽管与CO相比，PCO的限制性更强，但许多算法具有相同的渐近缓存复杂度界限。然后，本文描述了一种用于分层缓存的新调度程序，该调度程序扩展了有关“空间受限调度程序”的最新工作，以允许在并行子任务之间具有任意工作不平衡的计算。对于使用PCO模型分析的嵌套并行计算，此调度程序在具有分层缓存的并行计算机模型上具有可证明的良好缓存性能和运行时性能。我们表明，在合理的假设下，我们的调度程序是“工作高效的”，即高速缓存未命中的成本在各个处理器之间是均匀平衡的，即，可以通过获取高速缓存未命中的总成本来确定运行时长。分析计算，然后将其除以处理器数量。相反，为了进一步支持我们的模型，我们证明如果工作，深度和顺序缓存复杂度是用于分析计算的唯一参数，则没有调度程序可以达到这样的界限（针对缓存未命中和运行时进行优化）。

著录项

来源
《Proceedings of the twenty-third annual symposium on parallelism in algorithms and architectures》|2011年|355-366|共12页
会议地点 San Jose CA(US);San Jose CA(US)
作者
Guy E. Blelloch; Jeremy T. Fineman; Phillip B. Gibbons; Harsha Vardhan Simhadri;
展开▼
作者单位

Carnegie Mellon University;

Carnegie Mellon University;

Intel Labs Pittsburgh;

Carnegie Mellon University;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
parallel hierarchical memory; cost models; schedulers; analysis of parallel algorithms; cache complexity;

机译：并行分层存储器；成本模型；调度程序；并行算法分析；缓存复杂度;

相似文献

外文文献
中文文献
专利

1. Design and Implementation of a Cache Hierarchy-Aware Task Scheduling for Parallel Loops on Multicore Architecture [J] . Nader Khammassi, Jean-Christophe Le Lann Computer Science & Information Technology . 2014,第2期

机译：多核体系结构上并行循环的缓存层次结构感知任务调度的设计和实现
2. Influence of Inter-Agent Communication Cost to Diffusion Scheduling in Irregular Parallel Computations [J] . Marek Grochowski, Ewa Tuska, Piotr Uhruski Inteligencia Artificial : Ibero-American Journal of Artificial Intelligence . 2005,第28期

机译：代理间通信成本对不规则并行计算中扩散调度的影响
3. A schedule cache for data parallel unstructured computations [J] . Thomas Brandes, Cecile Germain-Renaud Parallel Computing . 2000,第13a14期

机译：用于数据并行非结构化计算的调度缓存
4. Scheduling Irregular Parallel Computations on Hierarchical Caches [C] . Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Annual symposium on parallelism in algorithms and architectures . 2011

机译：调度在分层缓存上的不规则并行计算
5. Predicting Performance of Parallel Analytics and Irregular Computations [D] . ?Zhu, Gangyi 2019

机译：预测并行分析和不规则计算的性能
6. G-Computation and Hierarchical Models for Estimating Multiple Causal Effects From Observational Disease Registries With Irregular Visits [O] . Zach Shahn, Ying Li, Zhaonan Sun, 2019

机译：G计算和分层模型用于从不定期访问的观察性疾病登记处估计多种因果效应
7. Scheduling Irregular Parallel Computations on Hierarchical Caches [O] . Guy E. Blelloch, Jeremy T. Fineman, Harsha Vardhan Simhadri, 2011

机译：在分层缓存上调度不规则并行计算

Scheduling Irregular Parallel Computations on Hierarchical Caches

摘要

著录项

相似文献

相关主题

期刊订阅