Multis, shared-memory multiprocessors that are implemented with single buses and snooping cache protocols are inherently limited to a small number of processors, and, as systems grow beyond a single bus, the bandwidth requirements of broadcast operations limit scalability. Hardware support to provide cache coherence without the use of broadcast can become very expensive. An approach to maintaining coherence using approximate information held in special-purpose caches called pruning-caches that provides robust performance over a wide range of workloads is presented. The pruning-cache approach is compared to the more conventional inclusion cache for providing multilevel inclusion (MLI) in the cache hierarchy. It is shown that pruning-caches are more cost-effective and more robust. Using both analysis and simulation, it is also shown that the k-ary n-cube topology provides scalable, bottleneck-free communication for uniform, point-to-point traffic.
展开▼