In the data parallel programming style the user usually specifies the data parallelism explicitly so that the compiler can generate efficient code without enhanced analysis techniques. In some situations it is not possible to specify the parallelism explicitly or this might be not very convenient. This is especially true for loop nests with data dependences between the data of distributed dimensions. In the case of uniform loop nests there are scheduling, mapping and partitioning techniques available. Some different strategies have been considered and evaluated with existing High Performance Fortran compilation systems. This paper gives some results about the performance and the benefits of the different techniques and optimizations. The results are intended to direct the future development of data parallel compilers.
展开▼