over the last decade processor speed has increased dramati- cally, whereas the speed of the memory subsystem improved at a modest rate. Due to the increase in the cache miss la- tency (in terms of the processor cycle), processors stall on cashe misses for a significant portion of its execution time. Multithreaded processors has been proposed in the literature to reduce the processor stall time due to cache misses. AI- though multithreading improves processor utilization, it may alSo increase cache miss rates, because in a multithreaded processor multiple threads share the same cache. which ef fectively reduces the cache size available to each individ- ual thread. Increased processor utilization and the increase in the cache miss rate demands higher memory bandwidth. A novel compiler optimization method has been presented in this paper that improves data locality for each of the threads and enhances data sharing among the threads. The method is based on loop transformation theory and optimizes both spa- tial and temporal data locality. The created threads exhibit high level of intra-thread and inter-thread data locality which effectively reduces both the data cache miss rates and the to- tal execution time of numerically intensive computation run- ning on a multithreaded processor.
展开▼