首页> 外文学位 >Parallel algorithms for large-scale graph clustering on distributed memory architectures.
【24h】

Parallel algorithms for large-scale graph clustering on distributed memory architectures.

机译:用于分布式内存体系结构上的大规模图形聚类的并行算法。

获取原文
获取原文并翻译 | 示例

摘要

Graph algorithms on parallel architectures present an interesting case study for irregular applications. We address one such irregular application --- one of clustering real world graphs constructed out of biological data and open-source communities data using parallel computers. While theoretical formulations of the clustering operation are either intractable or computationally prohibitive, efficient heuristics exist to tackle the problem in practice. Yet, implementing these heuristics under a parallel setting becomes a significant challenge owing to a combination of factors including: irregular data access and movement patterns, dependence of computational workload on the input, and a general need to maintain auxiliary pointer-based data structures. We present the design and evaluation of several parallel implementations of a popular serial graph clustering heuristic called the Shingling heuristic, which was originally developed by Gibson et al. Our MapReduce implementation, targets distributed memory clusters running Hadoop and MPI. We also extend the original algorithm to handle weighed graphs. Operating on an input graph that can be represented as a list of edges or adjacency list, our algorithm uses a combination of shuffling and sorting operations, and pipelined MapReduce stages to implement the various phases of the algorithm. As a concrete case for application, we apply the methods developed on large-scale biological graphs obtained from a metagenomic community. Experimental results show both qualitative and performance improvements over previous executions of a baseline version of the clustering method. We also compare our results against other popular generic tools designed for community detection. As another applied case study of our research, we design and evaluate a cluster-based approach for socio-technical coordination in open-source community networks. The research experience in both these domains serve to demonstrate the high utility of cluster-based approaches in scientific domains.
机译:并行体系结构上的图算法为不规则应用提供了一个有趣的案例研究。我们解决了一种这样的不规则应用程序,即使用并行计算机对由生物数据和开源社区数据构成的现实世界图进行聚类之一。尽管聚类操作的理论公式难以处理或在计算上是禁止的,但仍然存在有效的启发式方法来解决实际问题。然而,由于多种因素的组合,在并行设置下实施这些试探法成为一项巨大的挑战,这些因素包括:不规则的数据访问和移动模式,计算工作量对输入的依赖性以及维护基于辅助指针的数据结构的一般需求。我们介绍一种流行的串行图聚类启发式算法Shingling启发式算法的几种并行实现的设计和评估,该算法最初由Gibson等人开发。我们的MapReduce实施针对运行Hadoop和MPI的分布式内存集群。我们还扩展了原始算法来处理加权图。在可表示为边列表或邻接列表的输入图上运行,我们的算法结合了改组和排序操作以及流水线的MapReduce阶段,以实现算法的各个阶段。作为具体的应用案例,我们应用了从宏基因组学社区获得的大规模生物学图上开发的方法。实验结果表明,与之前执行聚类方法的基准版本相比,该方法在质量和性能上均得到了改善。我们还将我们的结果与专为社区检测而设计的其他流行通用工具进行了比较。作为我们研究的另一个应用案例研究,我们设计和评估了基于集群的开源社区网络中社会技术协调方法。这两个领域的研究经验证明了基于簇的方法在科学领域中的高度实用性。

著录项

  • 作者

    Rytsareva, Inna.;

  • 作者单位

    Washington State University.;

  • 授予单位 Washington State University.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2014
  • 页码 104 p.
  • 总页数 104
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号