【24h】

Exploring speculative parallelism in SPEC2006

机译:在SPEC2006中探索推测性并行性

获取原文

摘要

The computer industry has adopted multi-threaded and multi-core architectures as the clock rate increase stalled in early 2000's. It was hoped that the continuous improvement of single-program performance could be achieved through these architectures. However, traditional parallelizing compilers often fail to effectively parallelize general-purpose applications which typically have complex control flow and excessive pointer usage. Recently hardware techniques such as Transactional Memory (TM) and Thread-Level Speculation (TLS) have been proposed to simplify the task of parallelization by using speculative threads. Potential of speculative parallelism in general-purpose applications like SPEC CPU 2000 have been well studied and shown to be moderately successful. Preliminary work examining the potential parallelism in SPEC2006 deployed parallel threads with a restrictive TLS execution model and limited compiler support, and thus only showed limited performance potential. In this paper, we first analyze the cross-iteration dependence behavior of SPEC 2006 benchmarks and show that more parallelism potential is available in SPEC 2006 benchmarks, comparing to SPEC2000. We further use a state-of-the-art profile-driven TLS compiler to identify loops that can be speculatively parallelized. Overall, we found that with optimal loop selection we can potentially achieve an average speedup of 60% on four cores over what could be achieved by a traditional parallelizing compiler such as Intel's ICC compiler.We also found that an additional 11% improvement can be potentially obtained on selected benchmarks using 8 cores when we extend TLS on multiple loop levels as opposed to restricting to a single loop level.
机译:随着时钟速率的增长在2000年代初期停滞不前,计算机行业已采用多线程和多核体系结构。希望通过这些体系结构可以实现单程序性能的不断提高。但是,传统的并行化编译器通常无法有效地并行化通常具有复杂控制流和过多指针使用情况的通用应用程序。最近,已经提出了诸如事务性存储器(TM)和线程级推测(TLS)的硬件技术,以通过使用推测性线程来简化并行化任务。在诸如SPEC CPU 2000之类的通用应用中,推测并行的潜力已经得到了充分的研究,并显示出一定程度的成功。初步工作检查了SPEC2006中潜在的并行性,该并行线程使用受限的TLS执行模型和有限的编译器支持部署了并行线程,因此仅显示了有限的性能潜力。在本文中,我们首先分析了SPEC 2006基准测试的跨迭代依赖行为,并表明与SPEC2000相比,SPEC 2006基准测试中具有更多的并行潜力。我们还使用最新的配置文件驱动的TLS编译器来识别可以推测性并行化的循环。总的来说,我们发现通过最佳的环路选择,我们可以在四个内核上平均实现60%的平均速度提升,而传统并行化编译器(例如Intel的ICC编译器)可以实现60%的平均提升速度;此外,我们还发现潜在地将性能提高11%当我们将TLS扩展到多个循环级别而不是限制到单个循环级别时,使用8个内核在选定基准上获得的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号