Efficient Handling of Lock Hand-off in DSM Multiprocessors with Buffering Coherence Controllers

Benjamín Sahelices; Agustín de Dios; Pablo Ibáez; Víctor Vials-Yúfera; José María Llabería

首页> 中文期刊> 《计算机科学技术学报：英文版》 >Efficient Handling of Lock Hand-off in DSM Multiprocessors with Buffering Coherence Controllers

Efficient Handling of Lock Hand-off in DSM Multiprocessors with Buffering Coherence Controllers

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相关主题

摘要

Synchronization in parallel programs is a major performance bottleneck in multiprocessor systems. Shared data is protected by locks and a lot of time is spent on the competition arising at the lock hand-off. In order to be serialized, requests to the same cache line can either be bounced (NACKed) or buffered in the coherence controller. In this paper, we focus mainly on systems whose coherence controllers buffer requests. In a lock hand-off, a burst of requests to the same line arrive at the coherence controller. During lock hand-off only the requests from the winning processor contribute to progress of the computation, since the winning processor is the only one that will advance the work. This key observation leads us to propose a hardware mechanism we call request bypassing, which allows requests from the winning processor to bypass the requests buffered in the coherence controller keeping the lock line. We present an inexpensive implementation of request bypassing that reduces the time spent on all the execution phases of a critical section (acquiring the lock, accessing shared data, and releasing the lock) and which, as a consequence, speeds up the whole parallel computation. This mechanism requires neither compiler or programmer support nor ISA or coherence protocol changes. By simulating a 32-processor system, we show that using request bypassing does not degrade but rather improves performance in three applications with low synchronization rates, while in those having a large amount of synchronization activity (the remaining four), we see reductions in execution time and in lock stall time ranging from 14% to 39% and from 52% to 71%, respectively. We compare request bypassing with a previously proposed technique called read combining and with a system that bounces requests, observing a significantly lower execution time with the bypassing scheme. Finally, we analyze the sensitivity of our results to some key hardware and software parameters.

著录项

来源
《计算机科学技术学报：英文版》 |2012年第1期|75-91|共17页
作者
Benjamín Sahelices; Agustín de Dios; Pablo Ibáez; Víctor Vials-Yúfera; José María Llabería;
展开▼
作者单位

Computer Science Department and HiPEAC European Network of Excellence;

University of Valladolid;

Valladolid;

Spain;

Computer Science and Systems Engineering Department;

I3A Research Institute and HiPEAC European Network of Excellence;

University of Zaragoza;

Zaragoza;

Spain;

Computer Architecture Department and HiPEAC European Network of Excellence;

Polytechnic University of Catalua Barcelona;

Spain;

展开▼
原文格式 PDF
正文语种 chi
中图分类运算器和控制器（CPU）;存贮器;
关键词
多处理器系统; 一致性协议; 执行时间; 性能瓶颈; 共享数据; 并行程序; 控制器; 缓冲区;

Efficient Handling of Lock Hand-off in DSM Multiprocessors with Buffering Coherence Controllers

摘要

著录项

相关主题

期刊订阅