As core counts increase, lock acquisition and release become even more critical because they lie on the critical path of shared memory applications. In this paper, we show that many applications exhibit regular and repeating lock sharing patterns. Based on this observation, we introduce SpecLock, an efficient hardware mechanism which speculates on the lock acquisition pattern between cores. Upon the release of a lock, the cache line containing the lock is speculatively forwarded to the next consumer of the lock. This forwarding action is performed via a specialized prefetch request and does not require coherence protocol modification. Further, the lock is not speculatively acquired, only the cache line containing the lock variable is placed in the private cache of the predicted consumer. Speculative forwarding serves to hide the remote core's lock acquisition latency. SpecLock is distributed and all predictions are made locally at each core. We show that SpecLock captures 87% of predictable lock patterns correctly and improves performance by an average of 10% with 64 cores. SpecLock incurs a negligible overhead, with a 75% area reduction compared to past work. Compared to two state of the art methods, SpecLock provides a speedup of 8% and 4% respectively.
展开▼