Efficient and Fair Multi-programming in GPUs via Effective Bandwidth Management

机译：通过有效的带宽管理在GPU中进行高效，公平的多重编程

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Managing the thread-level parallelism (TLP) of GPGPU applications by limiting it to a certain degree is known to be effective in improving the overall performance. However, we find that such prior techniques can lead to sub-optimal system throughput and fairness when two or more applications are co-scheduled on the same GPU. It is because they attempt to maximize the performance of individual applications in isolation, ultimately allowing each application to take a disproportionate amount of shared resources. This leads to high contention in shared cache and memory. To address this problem, we propose new application-aware TLP management techniques for a multi-application execution environment such that all co-scheduled applications can make good and judicious use of all the shared resources. For measuring such use, we propose an application-level utility metric, called effective bandwidth, which accounts for two runtime metrics: attained DRAM bandwidth and cache miss rates. We find that maximizing the total effective bandwidth and doing so in a balanced fashion across all co-located applications can significantly improve the system throughput and fairness. Instead of exhaustively searching across all the different combinations of TLP configurations that achieve these goals, we find that a significant amount of overhead can be reduced by taking advantage of the trends, which we call patterns, in the way application's effective bandwidth changes with different TLP combinations. Our proposed pattern-based TLP management mechanisms improve the system throughput and fairness by 20% and 2x, respectively, over a baseline where each application executes with a TLP configuration that provides the best performance when it executes alone.

机译：通过将GPGPU应用程序的线程级并行性（TLP）限制到一定程度来管理它可以有效地提高整体性能。但是，我们发现，当在同一个GPU上同时调度两个或多个应用程序时，此类现有技术可能导致次优的系统吞吐量和公平性。这是因为它们试图孤立地最大化单个应用程序的性能，最终使每个应用程序占用不成比例的共享资源。这导致共享缓存和内存的高争用。为了解决此问题，我们为多应用程序执行环境提出了新的可感知应用程序的TLP管理技术，以使所有共同调度的应用程序都能充分，明智地利用所有共享资源。为了衡量这种使用，我们提出了一个应用程序级别的效用度量标准，称为有效带宽，该度量标准考虑了两个运行时度量标准：获得的DRAM带宽和高速缓存未命中率。我们发现，最大化所有有效带宽并以平衡的方式在所有位于同一位置的应用程序中这样做可以显着提高系统的吞吐量和公平性。我们发现，与其详尽地搜索实现这些目标的TLP配置的所有不同组合，不如通过利用这种趋势（我们称之为模式）来减少大量开销，而这种趋势就是应用程序的有效带宽随不同TLP的变化而变化。组合。我们提出的基于模式的TLP管理机制在每个应用程序使用TLP配置执行时的基线上分别将系统吞吐量和公平性提高了20％和2倍，而TLP配置在单独执行时可提供最佳性能。

著录项

来源
《IEEE International Symposium on High Performance Computer Architecture》|2018年|247-258|共12页
会议地点
作者
Haonan Wang; Fan Luo; Mohamed Ibrahim; Onur Kayiran; Adwait Jog;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Bandwidth; Graphics processing units; Measurement; Throughput; Indexes; Random access memory; Interference;

机译：带宽;图形处理单元;测量;吞吐量;索引;随机存取存储器;干扰;

相似文献

外文文献
中文文献
专利

1. A stateless fairness-driven active queue management scheme for efficient and fair bandwidth allocation in congested Internet routers [J] . Abbas Ghulam, Manzoor Sanaullah, Hussain Masroor Telecommunication systems: Modeling, Analysis, Design and Management . 2018,第1期

机译：拥挤的互联网路由器有效和公平带宽分配的无状态公平驱动的积极队列管理方案
2. A Novel Fairness Mechanism Based on the Number of Effective Nodes for Efficient Bandwidth Allocation in the Resilient Packet Ring [J] . Dong-Hun LEE, Jae-Hwoon LEE IEICE Transactions on Communications . 2006,第5期

机译：一种基于有效节点数的弹性分组环中有效带宽分配的公平机制
3. Rotating Preference Queues: An Efficient Queue Management Scheme for Fair Bandwidth Allocation [J] . Yang Jui-Pin Communications Letters, IEEE . 2013,第2期

机译：旋转优先队列：用于公平带宽分配的高效队列管理方案
4. Efficient and Fair Multi-programming in GPUs via Effective Bandwidth Management [C] . Haonan Wang, Fan Luo, Mohamed Ibrahim, IEEE International Symposium on High Performance Computer Architecture . 2018

机译：通过有效的带宽管理GPU中的高效和公平多编程
5. Fair packet scheduling and bandwidth management in wireless networks. [D] . Jian, Ying. 2008

机译：无线网络中的公平数据包调度和带宽管理。
6. On fair effective and efficient REDD mechanism design [O] . Michael Obersteiner, Michael Huettner, Florian Kraxner, 2009

机译：关于公平有效和高效的REDD机制设计
7. A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs [O] . Stehle, Elias, Jacobsen, Hans-Arno 2017

机译：GpU上的内存带宽高效混合基数排序

Efficient and Fair Multi-programming in GPUs via Effective Bandwidth Management

摘要

著录项

相似文献

相关主题

期刊订阅