首页> 外文会议>IEEE International Symposium on High Performance Computer Architecture >Efficient and Fair Multi-programming in GPUs via Effective Bandwidth Management
【24h】

Efficient and Fair Multi-programming in GPUs via Effective Bandwidth Management

机译:通过有效的带宽管理在GPU中进行高效,公平的多重编程

获取原文

摘要

Managing the thread-level parallelism (TLP) of GPGPU applications by limiting it to a certain degree is known to be effective in improving the overall performance. However, we find that such prior techniques can lead to sub-optimal system throughput and fairness when two or more applications are co-scheduled on the same GPU. It is because they attempt to maximize the performance of individual applications in isolation, ultimately allowing each application to take a disproportionate amount of shared resources. This leads to high contention in shared cache and memory. To address this problem, we propose new application-aware TLP management techniques for a multi-application execution environment such that all co-scheduled applications can make good and judicious use of all the shared resources. For measuring such use, we propose an application-level utility metric, called effective bandwidth, which accounts for two runtime metrics: attained DRAM bandwidth and cache miss rates. We find that maximizing the total effective bandwidth and doing so in a balanced fashion across all co-located applications can significantly improve the system throughput and fairness. Instead of exhaustively searching across all the different combinations of TLP configurations that achieve these goals, we find that a significant amount of overhead can be reduced by taking advantage of the trends, which we call patterns, in the way application's effective bandwidth changes with different TLP combinations. Our proposed pattern-based TLP management mechanisms improve the system throughput and fairness by 20% and 2x, respectively, over a baseline where each application executes with a TLP configuration that provides the best performance when it executes alone.
机译:通过将GPGPU应用程序的线程级并行性(TLP)限制到一定程度来管理它可以有效地提高整体性能。但是,我们发现,当在同一个GPU上同时调度两个或多个应用程序时,此类现有技术可能导致次优的系统吞吐量和公平性。这是因为它们试图孤立地最大化单个应用程序的性能,最终使每个应用程序占用不成比例的共享资源。这导致共享缓存和内存的高争用。为了解决此问题,我们为多应用程序执行环境提出了新的可感知应用程序的TLP管理技术,以使所有共同调度的应用程序都能充分,明智地利用所有共享资源。为了衡量这种使用,我们提出了一个应用程序级别的效用度量标准,称为有效带宽,该度量标准考虑了两个运行时度量标准:获得的DRAM带宽和高速缓存未命中率。我们发现,最大化所有有效带宽并以平衡的方式在所有位于同一位置的应用程序中这样做可以显着提高系统的吞吐量和公平性。我们发现,与其详尽地搜索实现这些目标的TLP配置的所有不同组合,不如通过利用这种趋势(我们称之为模式)来减少大量开销,而这种趋势就是应用程序的有效带宽随不同TLP的变化而变化。组合。我们提出的基于模式的TLP管理机制在每个应用程序使用TLP配置执行时的基线上分别将系统吞吐量和公平性提高了20%和2倍,而TLP配置在单独执行时可提供最佳性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号