【24h】

Highly scalable parallel sorting

机译:高度可扩展的并行排序

获取原文
获取原文并翻译 | 示例

摘要

Sorting is a commonly used process with a wide breadth of applications in the high performance computing field. Early research in parallel processing has provided us with comprehensive analysis and theory for parallel sorting algorithms. However, modern supercomputers have advanced rapidly in size and changed significantly in architecture, forcing new adaptations to these algorithms. To fully utilize the potential of highly parallel machines, tens of thousands of processors are used. Efficiently scaling parallel sorting on machines of this magnitude is inhibited by the communication-intensive problem of migrating large amounts of data between processors. The challenge is to design a highly scalable sorting algorithm that uses minimal communication, maximizes overlap between computation and communication, and uses memory efficiently. This paper presents a scalable extension of the Histogram Sorting method, making fundamental modifications to the original algorithm in order to minimize message contention and exploit overlap. We implement Histogram Sort, Sample Sort, and Radix Sort in Charm++ and compare their performance. The choice of algorithm as well as the importance of the optimizations is validated by performance tests on two predominant modern supercomputer architectures: XT4 at ORNL (Jaguar) and Blue Gene/P at ANL (Intrepid).
机译:排序是在高性能计算领域中具有广泛应用程序的常用过程。并行处理的早期研究为我们提供了有关并行排序算法的综合分析和理论。但是,现代超级计算机的规模迅速发展,并且体系结构发生了重大变化,迫使对这些算法进行新的调整。为了充分利用高度并行机的潜力,使用了数以万计的处理器。在处理器之间迁移大量数据的通信密集型问题阻碍了在这种规模的计算机上有效缩放并行排序的工作。挑战在于设计一种高度可扩展的排序算法,该算法使用最少的通信,最大化计算和通信之间的重叠以及有效使用内存。本文介绍了直方图排序方法的可扩展性扩展,对原始算法进行了基本修改,以最大程度地减少消息争用并利用重叠。我们在Charm ++中实现直方图排序,样本排序和基数排序,并比较它们的性能。算法的选择以及优化的重要性已通过对两种主要的现代超级计算机体系结构的性能测试进行了验证:ORNL的XT4(Jaguar)和ANL的Blue Gene / P(Intrepid)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号