Multi-core processors are widely used in high performance computing, however, the parallelization of regular sequential programs and the optimization of running time of loop nests are still challenging issues. We present the dependence analysis of nested loop for tiling in polyhedral model, which makes it possible to automatically transform the sequential code into coarse-grain parallel program. Then a genetic algorithm is introduced to optimize the scheduling of tiled task queue for communication overhead in multi-core array architecture. The simulation of LU decomposition proves that our approach can generate more effective parallel code to improve the data locality and load-balanced execution among cores.%多核处理器已广泛应用于高性能计算领域,如何有效地将传统串行程序转换为并行代码并减少程序中嵌套循环所占用时间仍是该领域的挑战性问题.本文首先基于多面体模型对嵌套循环进行依赖特征分析并实现瓦片分割,据此自动生成粗粒度并行代码.针对多核阵列处理器的结构特点,采用遗传算法生成通信优化的瓦片任务序列,在此基础上建立了有效的任务调度模型.最后将上述方法应用于LU分解,结果表明该方法与传统调度算法相比,在增加数据局部性、实现负载平衡方面具有更好效果.
展开▼