首页> 外文学位 >Metascalable hybrid message-passing and multithreading algorithms for n-tuple computation.

【24h】

Metascalable hybrid message-passing and multithreading algorithms for n-tuple computation.

机译：用于n元组计算的metascalable混合消息传递和多线程算法。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This dissertation research develops a scalable hybrid message-passing and multithreading algorithm for n-tuple molecular dynamics (MD) simulation, which will continue to scale on future architectures ( i.e. achieving metascalability). The two major goals of this dissertation research are: (1) design a scalable hybrid message-passing and multithreading parallel algorithmic framework on multicore architectures and evaluate it on most advanced parallel architectures; and (2) develop a computation-pattern algebraic framework to design scalable algorithms for general n-tuple computation and prove its optimality in a systematic and mathematically rigorous manner.;To achieve the first goal, we have developed and thoroughly analyzed algorithms for hybrid message passing interface (MPI) + open multiprocessing (OpenMP) parallelization of n-tuple MD simulation, which are scalable on large multicore clusters. Two data-privatization thread scheduling algorithms via nucleation-growth allocation have been designed: (1) compact- volume allocation scheduling (CVAS); and (2) breadth-first allocation scheduling (BFAS). These two algorithms combine fine-grain dynamic load balancing and minimal memory- footprint threading. Theoretical study has revealed decent asymptotic memory efficiency for both algorithms, thereby reducing 75% memory consumption compared to a naive-threading algorithm. Furthermore, performance benchmarks have confirmed higher performance of the hybrid MD algorithm over a traditional algorithm on large multicore clusters, where 2.58-fold speedup of the hybrid algorithm over the traditional algorithm was observed on 32,768 nodes of IBM BlueGene/P.;To achieve the second goal, we have developed a computation-pattern algebraic framework to mathematically formulate general n-tuple computation. Based on translation/reflection-invariant properties of computation patterns within this framework, we have designed a shift-collapse (SC) algorithm for cell-based parallel MD. Theoretical analysis has quantified the compact n-tuple search space and small communication cost of SC-MD for arbitrary n, which are reduced to those in best pair-computation approaches (e.g. eighth-shell method) for n = 2. Benchmark tests have shown that SC-MD outperforms our production MD code at the finest grain, with 9.7- and 5.1-fold speedups on Intel-Xeon and BlueGene/Q clusters. SC-MD has also exhibited excellent strong scalability.;In addition, we have analyzed the computational and data-access patterns of MD, which led to the development of a performance prediction model for short-range pair-wise force computations in MD simulations. The analysis and performance model provide fundamental understanding of computation patterns and optimality of certain parameters in MD simulations, thus allowing scientists to determine the optimal cell dimension in a linked-list cell method. The model has accurately estimated the number of operations during the simulations with the maximum error of 10.6% compared to actual measurements. Analysis and benchmark of the model have revealed that the optimal cell dimension minimizing the computation time is determined by a trade-off between decreasing search space and increasing linked-list cell access for smaller cells.;One difficulty about MD is that it is a dynamic irregular application, which often suffers considerable performance deterioration during execution. To address this problem, an optimal data-reordering schedule has been developed for runtime memory-access optimization of MD simulations on parallel computers. Analysis of the memory-access penalty during MD simulations has shown that the performance improvement from computation and data reordering degrades gradually as data translation lookaside buffer misses increase. We have also found correlations between the performance degradation with physical properties such as the simulated temperature, as well as with computational parameters such as the spatial-decomposition granularity. (Abstract shortened by UMI.).

机译：本论文的研究为n元组分子动力学（MD）仿真开发了一种可扩展的混合消息传递和多线程算法，该算法将继续在未来的体系结构上进行扩展（即实现元可伸缩性）。本论文研究的两个主要目标是：（1）在多核体系结构上设计一种可扩展的混合消息传递和多线程并行算法框架，并在最先进的并行体系结构上进行评估。（2）开发一种计算模式代数框架，以设计用于通用n元组计算的可扩展算法，并以系统和严格的数学方式证明其最优性。；为了实现第一个目标，我们开发并彻底分析了混合消息算法n元组MD仿真的传递接口（MPI）+开放式多处理（OpenMP）并行化，可在大型多核群集上进行扩展。设计了两种通过成核-增长分配的数据私有化线程调度算法：（1）紧凑型卷分配调度（CVAS）；（2）广度优先分配调度（BFAS）。这两种算法结合了细粒度的动态负载平衡和最小的内存占用线程。理论研究表明，两种算法的渐近存储效率都很高，因此与朴素线程算法相比，可减少75％的内存消耗。此外，性能基准已经证实，在大型多核集群上，混合MD算法的性能比传统算法高，在IBM BlueGene / P的32,768个节点上，混合算法的速度比传统算法提高了2.58倍;第二个目标，我们开发了一种计算模式代数框架，以数学方式表示一般的n元组计算。基于此框架内计算模式的平移/反射不变特性，我们针对基于单元格的并行MD设计了移位折叠（SC）算法。理论分析已量化了任意n的紧凑n元组搜索空间和SC-MD的较小通信成本，在n = 2的情况下，这些成本已降低到最佳配对计算方法（例如，八层壳法）中的成本。 SC-MD的性能要优于我们的生产MD代码，在Intel-Xeon和BlueGene / Q群集上的速度分别提高了9.7和5.1倍。 SC-MD还具有出色的强大可伸缩性。此外，我们分析了MD的计算和数据访问模式，从而导致了MD仿真中用于短程成对力计算的性能预测模型的发展。分析和性能模型提供了对MD模拟中的计算模式和某些参数的最优性的基本了解，从而使科学家可以通过链表单元法确定最佳的单元尺寸。该模型已准确估计了仿真期间的操作次数，与实际测量值相比，最大误差为10.6％。该模型的分析和基准测试表明，最小化计算时间的最佳像元尺寸是由减小搜索空间和增加对较小像元的链表像元访问之间的权衡决定的。MD的一个难题是它是动态的不规则应用程序，在执行过程中通常会遭受相当大的性能下降。为了解决这个问题，已经为并行计算机上的MD模拟的运行时内存访问优化开发了最佳的数据重新排序时间表。在MD仿真过程中对内存访问损失的分析表明，随着数据转换后备缓冲区缺失的增加，计算和数据重新排序的性能改进会逐渐降低。我们还发现性能下降与物理属性（如模拟温度）以及计算参数（如空间分解粒度）之间存在相关性。（摘要由UMI缩短。）。

著录项

作者
Kunaseth, Manaschai.;
展开▼
作者单位

University of Southern California.;

展开▼
授予单位 University of Southern California.;
学科 Computer Science.
学位 Ph.D.
年度 2013
页码 203 p.
总页数 203
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Hybrid Message-Passing Algorithm and Architecture for Decoding Cyclic Non-binary LDPC Codes [J] . Yichao LU, Gang HE, Guifen TIAN, IEICE Transactions on fundamentals of electronics, communications & computer sciences . 2013,第12期

机译：用于解码循环非二进制LDPC码的混合消息传递算法和体系结构
2. A Multithreaded Algorithm for Sparse Cholesky Factorization on Hybrid Multicore Architectures [J] . Meng Tang, Mohamed Gadou, Sanjay Ranka Procedia Computer Science . 2017,第1期

机译：混合多核体系结构上稀疏的Cholesky分解的多线程算法
3. Multithreaded Implementation of Hybrid String Matching Algorithm [J] . Akhtar Rasool, Dr. Nilay Khare, Himanshu Arora, International Journal on Computer Science and Engineering . 2012,第3期

机译：混合字符串匹配算法的多线程实现
4. A hybrid multithreading/message-passing approach for solving irregular problems on SMP clusters [C] . Jan-Jan Wu, Chia-Lien Chiang, Nai-Wei Lin International conference on parallel and distributed processing techniques and applications;PDPTA'99 . 1999

机译：解决SMP群集上不规则问题的混合多线程/消息传递方法
5. Low-Complexity Message-Passing Algorithms for Distributed Computation. [D] . Noorshams, Nima. 2013

机译：用于分布式计算的低复杂度消息传递算法。
6. Message-passing algorithms for compressed sensing [O] . David L. Donoho, Arian Maleki, Andrea Montanari 2009

机译：压缩感知的消息传递算法
7. Optimization of Simultaneous Multithreading Machines– Using Hybrid Genetic Algorithm [O] . B. Arunbala 2011

机译：同时多线程机的优化-使用混合遗传算法
8. Message-Passing Model for Highly Concurrent Computation. [R] . Martin, A. J. 1988

机译：用于高度并发计算的消息传递模型。

Metascalable hybrid message-passing and multithreading algorithms for n-tuple computation.

摘要

著录项

相似文献

相关主题

期刊订阅