...
首页> 外文期刊>Journal of Circuits, Systems, and Computers >CODE TRANSFORMATIONS FOR ENHANCING THE PERFORMANCE OF SPECULATIVELY PARALLEL THREADS
【24h】

CODE TRANSFORMATIONS FOR ENHANCING THE PERFORMANCE OF SPECULATIVELY PARALLEL THREADS

机译:代码转换,以增强指定并行线程的性能

获取原文
获取原文并翻译 | 示例
           

摘要

As technology advances, microprocessors that integrate multiple cores on a single chip are becoming increasingly common. How to use these processors to improve the performance of a single program has been a challenge. For general-purpose applications, it is especially difficult to create efficient parallel execution due to the complex control flow and ambiguous data dependences. Thread-level speculation and transactional memory provide two hardware mechanisms that are able to optimistically parallelize potentially dependent threads. However, a compiler that performs detailed performance trade-off analysis is essential for generating efficient parallel programs for these hardwares. This compiler must be able to take into consideration the cost of intra-thread as well as inter-thread value communication. On the other hand, the ubiquitous existence of complex, input-dependent control flow and data dependence patterns in general-purpose applications makes it impossible to have one technique optimize all program patterns. In this paper, we propose three optimization techniques to improve the thread performance: (ⅰ) scheduling instruction and generating recovery code to reduce the critical forwarding path introduced by synchronizing memory resident values; (ⅱ) identifying reduction variables and transforming the code the minimize the serializing execution; and (ⅲ) dynamically merging consecutive iterations of a loop to avoid stalls due to unbalanced workload. Detailed evaluation of the proposed mechanism shows that each optimization technique improves a subset but none improve all of the SPEC2000 benchmarks. On average, the proposed optimizations improve the performance by 7% for the set of the SPEC2000 benchmarks that have already been optimized for register-resident value communication.
机译:随着技术的进步,在单个芯片上集成多个内核的微处理器变得越来越普遍。如何使用这些处理器来提高单个程序的性能一直是一个挑战。对于通用应用程序,由于复杂的控制流和模糊的数据依赖关系,很难创建有效的并行执行。线程级推测和事务性内存提供了两种硬件机制,能够乐观地并行化潜在依赖的线程。但是,执行详细性能折衷分析的编译器对于为这些硬件生成有效的并行程序至关重要。该编译器必须能够考虑线程内以及线程间值通信的成本。另一方面,通用应用程序中普遍存在着复杂的,依赖于输入的控制流和依赖于数据的模式,因此不可能有一种技术来优化所有程序模式。在本文中,我们提出了三种优化技术来提高线程性能:(ⅰ)调度指令并生成恢复代码以减少通过同步内存驻留值而引入的关键转发路径; (ⅱ)确定归约变量并转换代码,以最大程度地减少序列化执行; (ⅲ)动态合并循环的连续迭代,以避免由于工作负载不平衡而造成的停顿。对提出的机制的详细评估表明,每种优化技术都可以改善一个子集,但不能改善所有SPEC2000基准。平均而言,对于已经针对寄存器-居民价值通信进行了优化的SPEC2000基准集,建议的优化将性能提高了7%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号