Are Static Schedules so Bad? A Case Study on Cholesky Factorization

机译：静态时间表这么糟糕吗？ Cholesky分解的案例研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Our goal is to provide an analysis and comparison of static and dynamic strategies for task graph scheduling on platforms consisting of heterogeneous and unrelated resources, such as GPUs and CPUs. Static scheduling strategies, that have been used for years, suffer several weaknesses. First, it is well known that underlying optimization problems are NP-Complete, what limits the capability of finding optimal solutions to small cases. Second, parallelism inside processing nodes makes it difficult to precisely predict the performance of both communications and computations, due to shared resources and co-scheduling effects. Recently, to cope with this limitations, many dynamic task-graph based runtime schedulers (StarPU, StarSs, QUARK, PaRSEC) have been proposed. Dynamic schedulers base their allocation and scheduling decisions on the one side on dynamic information such as the set of available tasks, the location of data and the state of the resources and on the other hand on static information such as task priorities computed from the whole task graph. Our analysis is deep but we concentrate on a single kernel, namely Cholesky factorization of dense matrices on platforms consisting of GPUs and CPUs. This application encompasses many important characteristics in our context. Indeed, it involves 4 different kernels (POTRF, TRSM, SYRK and GEMM) whose acceleration ratios on GPUs are strongly different (from 2.3 for POTRF to 29 for GEMM) and it consists in a phase where the number of available tasks if large, where the careful use of resources is critical, and in a phase with few tasks available, where the choice of the task to be executed is crucial. In this paper, we analyze the performance of static and dynamic strategies and we propose a set of intermediate strategies, by adding more static (resp. dynamic) features into dynamic (resp. static) strategies. Our conclusions are somehow unexpected in the sense that we prove that static-based strategies are very efficient, even in a context where performance estimations are not very good.

机译：我们的目标是提供关于由异构和无关资源组成的平台上的任务图表调度的静态和动态策略的分析和比较，例如GPU和CPU。已经使用的静态调度策略遭受了几个弱点。首先，众所周知，基本的优化问题是NP-完成的，是什么限制了对小案件找到最佳解决方案的能力。其次，由于共享资源和共调度效果，因此难以精确地预测通信和计算的性能。最近，为了应对这个限制，已经提出了许多基于动态的任务图的运行时调度员（Starpu，Stars，Quark，Parsec）。动态调度器基于动态信息的一侧基于动态信息的分配和调度决策，例如可用任务集合，数据的位置和资源状态，另一方面，静态信息（如从整个任务计算的任务优先级）图形。我们的分析深入，但我们专注于一个内核，即由GPU和CPU组成的平台上的密集矩阵的Cholesky分解。本申请包括我们背景中的许多重要特征。实际上，它涉及4个不同的核（Potrf，TRSM，Syrk和Gemm），其GPU上的加速度比强烈不同（来自PotRF的2.3为29到29的Gemm），它在一个阶段组成了可用任务的数量，如果大，则在其中仔细使用资源是至关重要的，并且在一个阶段，有很少的任务，那里可以执行要执行的任务是至关重要的。在本文中，我们分析了静态和动态策略的性能，我们通过在动态（RESP.STIC）策略中添加了更多静态（RESP.NAVERIC）功能来提出一系列中间策略。我们的结论是某种意想不到的，我们证明基于静态的策略非常有效，即使在性能估计不是很好的上下文中。

著录项

来源
《IEEE International Parallel and Distributed Processing Symposium》|2016年|576-1152p|共10页
会议地点
作者
Emmanuel Agullo; Olivier Beaumont; Lionel Eyraud-Dubois; Suraj Kumar;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.13-53;
关键词
Runtime Systems; Scheduling; Accelerators; Cholesky; Heterogeneous Systems; Unrelated Machines;

机译：运行系统;调度;加速器;乔莱斯基;异构系统;机器无关;
入库时间 2022-08-21 04:32:42

相似文献

外文文献
中文文献
专利

1. QR分解和Cholesky分解的Rice条件数 [J] . 李新秀, 聂小兵东南大学学报（英文版） . 2004,第001期
2. Distributed SBP Cholesky Factorization Algorithms with Near-Optimal Scheduling [J] . FRED G. GUSTAVSON, LARS KARLSSON, BO KAGSTROEM ACM transactions on mathematical software . 2010,第2期

机译：接近最优调度的分布式SBP Cholesky分解算法
3. Task scheduling using a block dependency DAG for block-oriented sparse Cholesky factorization [J] . Heejo Lee, Jong Kim, Sung Je Hong, Parallel Computing . 2003,第1期

机译：使用面向块的稀疏Cholesky分解的块依赖DAG的任务调度
4. MIXED-PRECISION CHOLESKY QR FACTORIZATION AND ITS CASE STUDIES ON MULTICORE CPU WITH MULTIPLE GPUs [J] . Yamazaki Ichitaro, Tomov Stanimire, Dongarra Jack SIAM Journal on Scientific Computing . 2015,第3期

机译：带多个GPU的多核CPU混合精度胆小QR分解及其案例研究
5. Are Static Schedules so Bad? A Case Study on Cholesky Factorization [C] . Emmanuel Agullo, Olivier Beaumont, Lionel Eyraud-Dubois, IEEE International Parallel and Distributed Processing Symposium . 2016

机译：静态时间表是如此糟糕吗？ Cholesky因式分解的案例研究
6. Matrix factorizations, triadic matrices, and modified Cholesky factorizations for optimization [D] . Fang, Haw-ren 2006

机译：矩阵分解，三元矩阵和改进的Cholesky分解以进行优化
7. Applying Dynamic Priority Scheduling Scheme to Static Systems of Pinwheel Task Model in Power-Aware Scheduling [O] . Ye-In Seol, Young-Kuk Kim -1

机译：动态优先级调度方案在动力感知型风车任务模型静态系统中的应用
8. Optimization of a Statically Partitioned Hypermatrix Sparse Cholesky Factorization [O] . José R. Herrero, Juan J. Navarro 2008

机译：静态分区超矩阵稀疏Cholesky分解的优化
9. Computational Models and Task Scheduling for Parallel Sparse Cholesky Factorization [R] . Liu, J. W. 1986

机译：并行稀疏Cholesky分解的计算模型和任务调度

Are Static Schedules so Bad? A Case Study on Cholesky Factorization

摘要

著录项

相似文献

相关主题

期刊订阅