In a simultaneous multithreaded system, a core’s pipeline resources are sometimes partitioned andotherwise shared amongst numerous active threads. One mutual resource is the write buffer, which acts asan intermediary between a store instruction’s retirement from the pipeline and the store value being writtento cache. The write buffer takes a completed store instruction from the load/store queue and eventuallywrites the value to the level-one data cache. Once a store is buffered with a write-allocate cache policy, thestore must remain in the write buffer until its cache block is in level-one data cache. This latency may varyfrom as little as a single clock cycle (in the case of a level-one cache hit) to several hundred clock cycles(in the case of a cache miss). This paper shows that cache misses routinely dominate the write buffer’sresources and deny cache hits from being written to memory, thereby degrading performance ofsimultaneous multithreaded systems. This paper proposes a technique to reduce denial of resources tocache hits by limiting the number of cache misses that may concurrently reside in the write buffer andshows that system performance can be improved by using this technique.
展开▼