首页> 外文期刊>International Journal of Data Science and Analytics >Automatic parallelization of representative-based clustering algorithms for multicore cluster systems
【24h】

Automatic parallelization of representative-based clustering algorithms for multicore cluster systems

机译:用于多核群集系统的基于代表性聚类算法的自动并行化

获取原文
获取原文并翻译 | 示例
       

摘要

Ease of programming and optimal parallel performance have historically been on the opposite side of a trade-off, forcing the user to choose. With the advent of the Big Data era and the rapid evolution of sequential algorithms, the data analytics community can no longer afford the trade-off. We observed that several clustering algorithms often share common traits-particularly, algorithms belonging to the same class of clustering exhibit significant overlap in processing steps. Here, we present our observation on domain patterns in representative-based clustering algorithms and how they manifest as clearly identifiable programming patterns when mapped to a Domain Specific Language (DSL). We have integrated the signatures of these patterns in the DSL compiler for parallelism identification and automatic parallel code generation. The compiler either generates MPI C++ code for distributed memory parallel processing or MPI-OpenMP C++ code for hybrid memory parallel processing, depending upon the target architecture. Our experiments on different state-of-the-art parallelization frameworks show that our system can achieve near-optimal speedup while requiring a fraction of the programming effort, making it an ideal choice for the data analytics community. Results are presented for both distributed and hybrid memory systems.
机译:易于编程和最佳平行性能历来一直在权衡的另一侧,强迫用户选择。随着大数据时代的出现和顺序算法的快速演变,数据分析社区无法再提供权衡。我们观察到,几种聚类算法通常共享常见的特征 - 特别是,属于同一类聚类的算法在处理步骤中表现出显着的重叠。在这里,我们在基于代表性的聚类算法中的域模式的观察以及它们在映射到域特定语言(DSL)时如何表现为清晰可识别的编程模式。我们在DSL编译器中集成了这些模式的签名,以进行并行识别和自动并行代码生成。编译器要么为混合存储器并行处理或MPI-Openmp C ++代码生成MPI C ++代码,具体取决于目标架构。我们对不同最先进的并行化框架的实验表明,我们的系统可以实现近最佳的加速,同时需要一小部分编程工作,使其成为数据分析社区的理想选择。为分布式和混合存储器系统提供了结果。

著录项

  • 来源
  • 作者单位

    Advanced Data Analytics and Parallel Technologies Lab Department of Computer Science and Information Systems Birla Institute of Technology and Science Pilani Pilani Campus Pilani India;

    Advanced Data Analytics and Parallel Technologies Lab Department of Computer Science and Information Systems Birla Institute of Technology and Science Pilani Pilani Campus Pilani India;

    Advanced Data Analytics and Parallel Technologies Lab Department of Computer Science and Information Systems Birla Institute of Technology and Science Pilani Pilani Campus Pilani India;

    Advanced Data Analytics and Parallel Technologies Lab Department of Computer Science and Information Systems Birla Institute of Technology and Science Pilani Pilani Campus Pilani India;

    Advanced Data Analytics and Parallel Technologies Lab Department of Computer Science and Information Systems Birla Institute of Technology and Science Pilani Pilani Campus Pilani India;

    Advanced Data Analytics and Parallel Technologies Lab Department of Computer Science and Information Systems Birla Institute of Technology and Science Pilani Pilani Campus Pilani India;

    Advanced Data Analytics and Parallel Technologies Lab Department of Computer Science and Information Systems Birla Institute of Technology and Science Pilani Pilani Campus Pilani India;

    Advanced Data Analytics and Parallel Technologies Lab Department of Computer Science and Information Systems Birla Institute of Technology and Science Pilani Pilani Campus Pilani India;

    Advanced Data Analytics and Parallel Technologies Lab Department of Computer Science and Information Systems Birla Institute of Technology and Science Pilani Pilani Campus Pilani India;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Clustering; Domain Specific Language; Parallelizing compiler; High Performance Computing; Programming patterns;

    机译:聚类;域特定语言;并行化编译器;高性能计算;编程模式;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号