Scalable shared-memory multiprocessors that are designed as cache-only memory architectures (Coma) allow automatic replication and migration of data in the main memory. This enhances programmability by hopefully eliminating the need for data distribution strategies and page migration schemes. A variant of Coma called Simple Coma has been proposed as a lower-cost alternative to hardware-intensive systems like Flat Coma. However, we find that Simple Coma is quite slower than Flat Coma. The main reason is the high page mapping, unmapping, and transfer overhead caused by memory fragmentation in Simple Coma. We propose a solution to the memory fragmentation problem that we call Multiplexed Simple Coma. The idea is to allow multiple virtual pages to map into the same physical page at the same time, therefore compressing the page working set of the application. Multiplexed Simple Coma requires very little support over Simple Coma and reduces its execution time by about 40%. We find that Multiplexed Simple Coma can be very easily implemented with off-the-shelf processors. In addition, there is no need to be selective when choosing what virtual pages are to share the same physical page. Overall, although Multiplexed Simple Coma is still slower than Flat Coma, since it is cheaper to implement, it represents a good cost-performance design point.
展开▼