Sequential consistency is the popular accepted criterion of correct execution in shared-memory multiprocessors. Typical implementation of sequential consistency requires each access to be delayed until the previous access in the same process completes. This is detrimental to performance. Prefetching is an effective way of overlapping the execution of memory accesses. This paper studies hardware-controlled prefetching in a directory-based cache coherent system, and proposes a new prefetching scheme as an improvement on the normal scheme. Besides, a cycle-by-cycle trace-driven simulation model is built to evaluate these prefetching schemes. Simulation results show that prefetching is effective in improving performance, and the new prefetching scheme we proposed can improve performance further.
展开▼