Applications with Linked Data Structures(LDS) usually suffer heavy last level Cache misses due to the lack of spatial locality. By analyzing the relationship between computation task and memory access task in hot loop, this paper investigates the impact of operation characteristic on the performance of threaded prefetching technique. Combining the features of CMP, this paper has implemented a staggered threaded prefetching technique for applications with LDS. The experiment results prove that the ratio of computation task and memory access task actually gives rise to significant effects on staggered threaded prefetching, and staggered threaded prefetching always outperforms conventional threaded prefetching techniques.%由于链式数据结构的存储缺乏空间局部性,导致程序执行过程中对链式数据的访问会发生严重的Cache缺失行为.通过对面向链式结构的线程预取性能分析,研究链式数据结构程序热点循环的计算任务量与访存任务量比例特征对线程预取性能的影响.结合多核处理器平台特点,实现了一种适用于链式数据结构的帮助线程间隔预取方法.实验结果进一步验证了计算任务量与访存任务量比例特征对间隔预取性能的影响,表明间隔预取相比于传统线程预取技术有明显的性能优势.
展开▼