首页> 外文学位 >Parallel algorithms for large-scale graph clustering on distributed memory architectures.

【24h】

Parallel algorithms for large-scale graph clustering on distributed memory architectures.

机译：用于分布式内存体系结构上的大规模图形聚类的并行算法。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Graph algorithms on parallel architectures present an interesting case study for irregular applications. We address one such irregular application --- one of clustering real world graphs constructed out of biological data and open-source communities data using parallel computers. While theoretical formulations of the clustering operation are either intractable or computationally prohibitive, efficient heuristics exist to tackle the problem in practice. Yet, implementing these heuristics under a parallel setting becomes a significant challenge owing to a combination of factors including: irregular data access and movement patterns, dependence of computational workload on the input, and a general need to maintain auxiliary pointer-based data structures. We present the design and evaluation of several parallel implementations of a popular serial graph clustering heuristic called the Shingling heuristic, which was originally developed by Gibson et al. Our MapReduce implementation, targets distributed memory clusters running Hadoop and MPI. We also extend the original algorithm to handle weighed graphs. Operating on an input graph that can be represented as a list of edges or adjacency list, our algorithm uses a combination of shuffling and sorting operations, and pipelined MapReduce stages to implement the various phases of the algorithm. As a concrete case for application, we apply the methods developed on large-scale biological graphs obtained from a metagenomic community. Experimental results show both qualitative and performance improvements over previous executions of a baseline version of the clustering method. We also compare our results against other popular generic tools designed for community detection. As another applied case study of our research, we design and evaluate a cluster-based approach for socio-technical coordination in open-source community networks. The research experience in both these domains serve to demonstrate the high utility of cluster-based approaches in scientific domains.

机译：并行体系结构上的图算法为不规则应用提供了一个有趣的案例研究。我们解决了一种这样的不规则应用程序，即使用并行计算机对由生物数据和开源社区数据构成的现实世界图进行聚类之一。尽管聚类操作的理论公式难以处理或在计算上是禁止的，但仍然存在有效的启发式方法来解决实际问题。然而，由于多种因素的组合，在并行设置下实施这些试探法成为一项巨大的挑战，这些因素包括：不规则的数据访问和移动模式，计算工作量对输入的依赖性以及维护基于辅助指针的数据结构的一般需求。我们介绍一种流行的串行图聚类启发式算法Shingling启发式算法的几种并行实现的设计和评估，该算法最初由Gibson等人开发。我们的MapReduce实施针对运行Hadoop和MPI的分布式内存集群。我们还扩展了原始算法来处理加权图。在可表示为边列表或邻接列表的输入图上运行，我们的算法结合了改组和排序操作以及流水线的MapReduce阶段，以实现算法的各个阶段。作为具体的应用案例，我们应用了从宏基因组学社区获得的大规模生物学图上开发的方法。实验结果表明，与之前执行聚类方法的基准版本相比，该方法在质量和性能上均得到了改善。我们还将我们的结果与专为社区检测而设计的其他流行通用工具进行了比较。作为我们研究的另一个应用案例研究，我们设计和评估了基于集群的开源社区网络中社会技术协调方法。这两个领域的研究经验证明了基于簇的方法在科学领域中的高度实用性。

著录项

作者
Rytsareva, Inna.;
展开▼
作者单位

Washington State University.;

展开▼
授予单位 Washington State University.;
学科 Computer science.
学位 Ph.D.
年度 2014
页码 104 p.
总页数 104
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Parallel algorithms for clustering biological graphs on distributed and shared memory architectures [J] . Inna Rytsareva, Timothy Chapman, Ananth Kalyanaraman International Journal of High Performance Computing and Networking . 2014,第4期

机译：用于在分布式和共享内存体系结构上对生物图进行聚类的并行算法
2. A novel parallel algorithm for large-scale Fock matrix construction with small locally distributed memory architectures: RT parallel algorithm [J] . Takashima H., Yamada S., Obara S., Journal of Computational Chemistry: Organic, Inorganic, Physical, Biological . 2002,第14期

机译：用于具有小的局部分布式存储体系结构的大规模Fock矩阵构建的新颖并行算法：RT并行算法
3. A novel parallel algorithm for large-scale fock matrix construction with smalllocally distributed memory architectures:RT parallel algorithm [J] . Hajime Takashima, Kunihiro Kitamura, So Yamada, Journal of Computational Chemistry: Organic, Inorganic, Physical, Biological . 2002,第14a15期

机译：一种用于具有小局部分布存储架构的大规模Fock矩阵构造的新颖并行算法：RT并行算法
4. Solution of large-scale scattering problems with the multilevel fast multipole algorithm parallelized on distributed-memory architectures [C] . Ergul O., Gurel L. International Symposium on Computer and Information Sciences . 2007

机译：在分布式内存架构上并行化的多级快速多极算法的大规模散射问题解决方案
5. Parallel algorithm synthesis for message-passing distributed-memory architectures. [D] . Pascale, Michael V. 1997

机译：消息传递分布式内存体系结构的并行算法综合。
6. Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters [O] . Haidong Lan, Yuandong Chan, Kai Xu, 2016

机译：基于Xeon-Phi簇的大规模生物序列比对的并行算法
7. Sublattice coding algorithm and distributed memory parallelization for large-scale exact diagonalizations of quantum many-body systems [O] . Alexander Wietek, Andreas M. Läuchli 2018

机译：分段编码算法和分布式存储器并行化，用于量子数量的大规模精确对角化
8. Implementation of Novel Parallel Cyclic Convolution Algorithms in Clusters and Multi-Core Architectures. [R] . Teixeira, M., Nevarez, F. 2014

机译：集群和多核架构中新型并行循环卷积算法的实现。

Parallel algorithms for large-scale graph clustering on distributed memory architectures.

摘要

著录项

相似文献

相关主题

期刊订阅