Of-chip replacement(capacity and conflict)and coherent read misses in a distributed shared memory system cause execution to stall for hundreds of cycles.These of-chip replacement and coherent read misses are recurring and forming sequences of two or more misses called streams.Prior streaming techniques ignored reordering of misses and not-recently-accessed streams while streaming data.In this paper,we present stream prefetcher design that can deal with both problems.Our stream prefetcher design utilizes stream waiting rooms to store not-recently-accessed streams.Stream waiting rooms help remove more of-chip misses.Using trace based simulations,our stream prefetcher design can remove 8%to 66%(on average 40%)and 17%to 63%(on average 39%)replacement and coherent read misses,respectively.Using cycle-accurate full-system simulation,our design gives speedups from 1.00 to1.17 of princeton application repository for shared-memory computers(PARSEC)workloads running on a distributed shared memory system with the exception of dedup and swaptions workloads.
展开▼