As silicon resources become increasingly abundant, core counts grow rapidly in successive chip-multiprocessors (CMP) generations. Parallel workloads represent an important segment for current and future CMPs mainly when many-core processors are considered. Unlike multiprogrammed workloads, the accessed blocks in these workloads can be classified in two categories: private, accessed only by one core, and shared, accessed by several cores. This paper takes advantage of this classification to access only a subset of the ways on each L1 cache access, thus reducing dynamic power consumption.
展开▼