首页> 外文学位 >Software Simultaneous Multithreading Through Compilation
【24h】

Software Simultaneous Multithreading Through Compilation

机译:通过编译实现软件同时多线程

获取原文
获取原文并翻译 | 示例

摘要

With the Dennard Scaling law break for a long time, the computer architecture design progress towards the wider rather than deeper organization. There are three ways to design wider architecture: 1. Putting more cores on the die to utilize thread level parallelism(TLP); 2. Putting more execution ports in the pipeline to utilize instruction level parallelism(ILP); 3. Making vector register wider to utilize data level parallelism(DLP). To speed up a wide spectrum of applications, modern CPU processors usually have all these characteristics at the same time. However, not all applications could make effective use of these characteristics simultaneously. To efficiently use any of these is still a challenging problem in the optimizing compiler research community even though these problems are not new. Processor architect designed simultaneous multithreading (SMT) to alleviate the problem.;Simultaneous multithreading is an essential technique for improving pipeline resource utilization and the overall power efficiency of chips especially when the processor is either wide or comprised of an in-order pipeline. For a wide-issue superscalar processor, there are two kinds of wasted issue slots: vertical waste where all issue slots in a cycle are empty; and horizontal waste where the issue slots in a cycle are partially empty [74]. Simultaneous multithreading, contrary to its other two counterparts: fine- grained multithreading and coarse-grained multithreading, can fill both vertical and horizontal waste, hence enhancing the overall efficiency. From the user applications point of view, there are two ways to improve the speed or the throughput: thread level parallelism (TLP) and instruction level parallelism (ILP). Simultaneous multithreading can exploit both TLP and ILP in the same cycle whereas fine-grained or coarse-grained multithreading can only exploit either TLP or ILP in a single cycle.;Despite all the benefits brought by simultaneous multithreading (SMT), it's adopted by semiconductor chip makers at a slow pace. AMD most recent Zen processor is its first CPU product featuring SMT. The only other well-known chip makers that offer SMT enabled processors are Intel and IBM. The reason for this is that SMT is very complex to implement. Many of the pipeline stages and memory system need hardware logic to have an efficient SMT implementation. For embedded chips, SMT is not even an affordable choice.;To harvest the benefits provided by SMT with incurring significant hardware costs, we propose a Compiler Based SMT implementation framework called CSSMT that achieves comparable performance to hardware-based SMT. With the help of advanced profiling techniques enabled by precise PMU counters in modern CPU, CSSMT can identify those applications that could potentially benefit from SMT and guide our LLVM based compiler to merge the hot spots in respective threads co-running in the same pipeline. CSSMT is orthogonal to the effect of hardware SMT and can bail out when the merging is not profitable based on its cost model derived from profiling data.
机译:长期以来,随着Dennard Scaling法律的破灭,计算机体系结构设计朝着更广泛而不是更深层次的组织发展。有三种方法可以设计更广泛的体系结构:1.在芯片上放置更多内核以利用线程级并行性(TLP); 2.在流水线中放置更多执行端口以利用指令级并行性(ILP); 3.使向量寄存器更宽以利用数据级并行性(DLP)。为了加快各种应用的速度,现代CPU处理器通常同时具有所有这些特性。但是,并非所有应用程序都能同时有效利用这些特性。尽管这些问题并不是新问题,但要有效地使用这些问题中的任何一个在优化编译器研究社区中仍然是一个具有挑战性的问题。处理器架构师设计了同时多线程(SMT)来缓解该问题。并行多线程是提高流水线资源利用率和芯片整体电源效率的一项必不可少的技术,尤其是当处理器较宽或按顺序排列的流水线时。对于宽问题的超标量处理器,有两种浪费的发行版位:垂直浪费,一个周期中的所有发行版位都为空;和水平浪费,其中一个周期中的问题槽部分为空[74]。同时多线程与其他两个相反:细粒度多线程和粗粒度多线程可以同时填充垂直和水平浪费,从而提高了整体效率。从用户应用程序的角度来看,有两种提高速度或吞吐量的方法:线程级并行性(TLP)和指令级并行性(ILP)。同步多线程可以在同一周期内同时利用TLP和ILP,而细粒度或粗粒度多线程只能在单个周期内利用TLP或ILP;尽管同时多线程(SMT)带来了所有好处,但它已被半导体采用芯片制造商发展缓慢。 AMD最新的Zen处理器是其首款具有SMT功能的CPU产品。提供SMT支持的处理器的唯一其他知名芯片制造商是英特尔和IBM。这样做的原因是SMT的实现非常复杂。许多管线阶段和存储系统都需要硬件逻辑才能具有有效的SMT实现。对于嵌入式芯片来说,SMT甚至不是负担得起的选择。要获得SMT提供的好处并产生大量硬件成本,我们提出了一种称为CSSMT的基于编译器的SMT实现框架,该框架可实现与基于硬件的SMT相当的性能。借助现代CPU中精确的PMU计数器支持的高级配置技术,CSSMT可以识别那些可能从SMT中受益的应用程序,并指导我们基于LLVM的编译器合并在同一管道中共同运行的各个线程中的热点。 CSSMT与硬件SMT的效果正交,并且可以根据从剖析数据得出的成本模型,在合并无利可图时予以纾困。

著录项

  • 作者

    Chen, Yuanfang.;

  • 作者单位

    University of Delaware.;

  • 授予单位 University of Delaware.;
  • 学科 Computer engineering.;Computer science.
  • 学位 Ph.D.
  • 年度 2018
  • 页码 105 p.
  • 总页数 105
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号