...
首页> 外文期刊>Journal of supercomputing >Exploring the performance limits of simultaneous multithreading for memory intensive applications
【24h】

Exploring the performance limits of simultaneous multithreading for memory intensive applications

机译:探索内存密集型应用程序同时执行多线程的性能限制

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. Recent studies have demonstrated that diversity of simultaneously executed applications can bring up significant performance gains due to SMT. However, the speedup of a single application that is parallelized into multiple threads, is often sensitive to its inherent instruction level parallelism (ILP), as well as the efficiency of synchronization and communication mechanisms between its separate, but possibly dependent threads. Moreover, as these separate threads tend to put pressure on the same architectural resources, no significant speedup can be observed. In this paper, we evaluate and contrast thread-level parallelism (TLP) and speculative precomputation (SPR) techniques for a series of memory intensive codes executed on a specific SMT processor implementation. We explore the performance limits by evaluating the tradeoffs between ILP and TLP for various kinds of instruction streams. By obtaining knowledge on how such streams interact when executed simultaneously on the processor, and quantifying their presence within each application's threads, we try to interpret the observed performance for each application when parallelized according to the aforementioned techniques. In order to amplify this evaluation process, we also present results gathered from the performance monitoring hardware of the processor.
机译:已经提出了同时多线程(SMT)来通过重叠来自单个宽发行处理器上的多个线程的指令来提高系统吞吐量。最近的研究表明,由于SMT,同时执行的应用程序的多样性可以显着提高性能。但是,并行化为多个线程的单个应用程序的速度通常对其固有的指令级并行性(ILP)以及其单独但可能相关的线程之间的同步和通信机制的效率很敏感。此外,由于这些单独的线程倾向于对相同的体系结构资源施加压力,因此无法观察到明显的加速。在本文中,我们针对在特定SMT处理器实现上执行的一系列内存密集型代码,评估并对比了线程级并行(TLP)和推测性预计算(SPR)技术。我们通过评估各种指令流的ILP和TLP之间的权衡来探索性能极限。通过获取有关这些流在处理器上同时执行时如何交互的知识,并量化它们在每个应用程序线程中的存在,我们尝试根据上述技术并行化解释每个应用程序观察到的性能。为了扩大此评估过程,我们还介绍了从处理器性能监视硬件收集的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号