首页> 外文会议>Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture >SD3: A Scalable Approach to Dynamic Data-Dependence Profiling
【24h】

SD3: A Scalable Approach to Dynamic Data-Dependence Profiling

机译:SD3:动态数据依赖分析的可扩展方法

获取原文

摘要

As multicore processors are deployed in mainstream computing, the need for software tools to help parallelize programs is increasing dramatically. Data-dependence profiling is an important technique to exploit parallelism in programs. More specifically, manual or automatic parallelization can use the outcomes of data-dependence profiling to guide where to parallelize in a program. However, state-of-the-art data-dependence profiling techniques are not scalable as they suffer from two major issues when profiling large and long-running applications: (1) runtime overhead and (2) memory overhead. Existing data-dependence profilers are either unable to profile large-scale applications or only report very limited information. In this paper, we propose a scalable approach to data-dependence profiling that addresses both runtime and memory overhead in a single framework. Our technique, called SD3, reduces the runtime overhead by parallelizing the dependence profiling step itself. To reduce the memory overhead, we compress memory accesses that exhibit stride patterns and compute data dependences directly in a compressed format. We demonstrate that SD3 reduces the runtime overhead when profiling SPEC 2006 by a factor of 4.1X and 9.7X on eight cores and 32 cores, respectively. For the memory overhead, we successfully profile SPEC 2006 with the reference input, while the previous approaches fail even with the train input. In some cases, we observe more than a 20X improvement in memory consumption and a 16X speedup in profiling time when 32 cores are used.
机译:随着多核处理器被部署在主流计算中,对帮助并行化程序的软件工具的需求正在急剧增加。数据依赖分析是一种在程序中利用并行性的重要技术。更具体地说,手动或自动并行化可以使用数据依赖概要分析的结果来指导程序中并行化的位置。但是,最新的数据相关性分析技术无法扩展,因为它们在对大型和长时间运行的应用程序进行性能分析时会遇到两个主要问题:(1)运行时开销和(2)内存开销。现有的数据依赖分析器要么无法分析大型应用程序,要么仅报告非常有限的信息。在本文中,我们提出了一种可扩展的数据依赖分析方法,该方法可在单个框架中解决运行时和内存开销。我们的技术称为SD3,它通过并行化相关性分析步骤本身来减少了运行时开销。为了减少内存开销,我们压缩表现出跨步模式的内存访问,并直接以压缩格式计算数据相关性。我们证明,在对SPEC 2006进行性能分析时,SD3在8个内核和32个内核上分别减少了4.1倍和9.7倍的运行时开销。对于内存开销,我们使用参考输入成功地对SPEC 2006进行了概要分析,而先前的方法甚至在使用火车输入的情况下也失败了。在某些情况下,当使用32个内核时,我们发现内存消耗提高了20倍以上,性能分析速度提高了16倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号