首页> 外文期刊>Operating systems review >Thread Clustering: Sharing-Aware Scheduling on SMP-CMP-SMT Multiprocessors
【24h】

Thread Clustering: Sharing-Aware Scheduling on SMP-CMP-SMT Multiprocessors

机译:线程群集:SMP-CMP-SMT多处理器上的共享感知调度

获取原文
获取原文并翻译 | 示例
           

摘要

The major chip manufacturers have all introduced chip multiprocessing (CMP) and simultaneous multithreading (SMT) technology into their processing units. As a result, even low-end computing systems and game consoles have become shared memory multiprocessors with L1 and L2 cache sharing within a chip. Mid- and large-scale systems will have multiple processing chips and hence consist of an SMP-CMP-SMT configuration with non-uniform data sharing overheads. Current operating system schedulers are not aware of these new cache organizations, and as a result, distribute threads across processors in a way that causes many unnecessary, long-latency cross-chip cache accesses. In this paper we describe the design and implementation of a scheme to schedule threads based on sharing patterns detected online using features of standard performance monitoring units (PMUs) available in today's processing units. The primary advantage of using the PMU infrastructure is that it is fine-grained (down to the cache line) and has relatively low overhead. We have implemented our scheme in Linux running on an 8-way Power5 SMP-CMP-SMT multiprocessor. For commercial multithreaded server workloads (VolanoMark, SPECjbb, and RUBiS), we are able to demonstrate reductions in cross-chip cache accesses of up to 70%. These reductions lead to application-reported performance improvements of up to 7%.
机译:主要的芯片制造商都已将芯片多处理(CMP)和同时多线程(SMT)技术引入其处理单元。结果,即使是低端计算系统和游戏机也已成为共享的内存多处理器,在芯片内具有L1和L2缓存共享。中型和大型系统将具有多个处理芯片,因此由具有不均匀数据共享开销的SMP-CMP-SMT配置组成。当前的操作系统调度程序并不了解这些新的缓存组织,因此,它们以导致许多不必要的,长等待时间的跨芯片缓存访问的方式跨处理器分配线程。在本文中,我们描述了一种方案的设计和实现,该方案基于在线检测到的共享模式来调度线程,这些模式使用当今处理单元中可用的标准性能监视单元(PMU)的功能进行。使用PMU基础结构的主要优点是它的粒度很细(到高速缓存行),并且开销相对较低。我们已经在运行8路Power5 SMP-CMP-SMT多处理器的Linux中实现了该方案。对于商用多线程服务器工作负载(VolanoMark,SPECjbb和RUBiS),我们能够证明跨芯片缓存访问量最多可减少70%。这些降低导致应用程序报告的性能提高了7%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号