A variety of historically-proven computer languages have recently been extended to support parallel computation in a data-parallel framework. The performance capabilities of modern microprocessors have made the "cluster-of-workstations" model of parallel computing more attractive, by permitting organizations to network together workstations to solve problems in concert, without the need to buy specialized and expensive supercomputers or mainframes. For the most part, research on these extended languages has focused on compile-time analyses which detect data dependencies and use user-provided hints to distribute data and encode the necessary communication operations between nodes in a multiprocessor system. These analyses have shown their value when the necessary hints are provided, but require more information at compile-time than may be available in large-scale real-world programs. This dissertation focuses on elements important to an efficient and portable implementation of runtime support for data-parallel languages, to the near absence of any reliance on compile-time information. We consider issues ranging from data distribution and global/local address conversion, through a communication framework intended to support modern networked computers, and optimizations for a variety of communications patterns common to data-parallel programs. The discussion is grounded in a complete implementation of a data-parallel language, C*, on stock workstations connected with standard network hardware. The performance of the resulting system is evaluated on a set of eight benchmark programs by comparing it to optimized sequential solutions to the same problems, and to the reference implementation of C* on the Connection Machine CM5 supercomputer. Our implementation, denoted pC* for "portable C*", generally performs within a factor of four of the optimized sequential algorithms. In addition, the optimizations developed in this dissertation permit a cluster of twelve workstations connected with Ethernet to outperform a sixty-four node CM5 in absolute performance on three of the eight benchmarks. Though we specifically address the issues of runtime support for C*, the material in this dissertation applies equally well to a variety of other parallel systems, especially the data-parallel features of Fortran 90 and High Performance Fortran.
展开▼