【24h】

Highly scalable parallel sorting

机译:高度可扩展并行分类

获取原文
获取外文期刊封面目录资料

摘要

Sorting is a commonly used process with a wide breadth of applications in the high performance computing field. Early research in parallel processing has provided us with comprehensive analysis and theory for parallel sorting algorithms. However, modern supercomputers have advanced rapidly in size and changed significantly in architecture, forcing new adaptations to these algorithms. To fully utilize the potential of highly parallel machines, tens of thousands of processors are used. Efficiently scaling parallel sorting on machines of this magnitude is inhibited by the communication-intensive problem of migrating large amounts of data between processors. The challenge is to design a highly scalable sorting algorithm that uses minimal communication, maximizes overlap between computation and communication, and uses memory efficiently. This paper presents a scalable extension of the Histogram Sorting method, making fundamental modifications to the original algorithm in order to minimize message contention and exploit overlap. We implement Histogram Sort, Sample Sort, and Radix Sort in Charm++ and compare their performance. The choice of algorithm as well as the importance of the optimizations is validated by performance tests on two predominant modern supercomputer architectures: XT4 at ORNL (Jaguar) and Blue Gene/P at ANL (Intrepid).
机译:排序是一个常用的过程,具有高性能计算领域的宽广应用。并行处理的早期研究为我们提供了对并行分类算法的综合分析和理论。然而,现代超级计算机的尺寸速度迅速,并且在建筑中显着变化,强调了对这些算法的新改编。为了充分利用高度平行的机器,使用成千上万的处理器。通过在处理器之间迁移大量数据的通信密集型问题抑制了对这种幅度的机器的有效缩放并行分类。挑战是设计一种高度可扩展的分类算法,它使用最小的通信,最大化计算和通信之间的重叠,并有效地使用内存。本文介绍了直方图排序方法的可扩展扩展,对原始算法进行了根本修改,以最小化消息争用和利用重叠。我们在Charm ++中实施直方图排序,样本排序和基数,并比较它们的性能。算法的选择以及优化的重要性通过两个主要现代超级计算机架构的性能测试验证:XT4 AT ORNL(JAGUAR)和ANL(INTEPID)的蓝色基因/ P。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号