首页> 外文期刊>International journal of parallel programming >Emerging Architectures Enable to Boost Massively Parallel Data Mining Using Adaptive Sparse Grids
【24h】

Emerging Architectures Enable to Boost Massively Parallel Data Mining Using Adaptive Sparse Grids

机译:新兴的体系结构能够使用自适应稀疏网格来促进大规模并行数据挖掘

获取原文
获取原文并翻译 | 示例

摘要

Gaining knowledge out of vast datasets is a main challenge in data-driven applications nowadays. Sparse grids provide a numerical method for both classification and regression in data mining which scales only linearly in the number of data points and is thus well-suited for huge amounts of data. Due to the recursive nature of sparse grid algorithms and their classical random memory access pattern, they impose a challenge for the parallelization on modern hardware architectures such as accelerators. In this paper, we present the parallelization on several current task- and data-parallel platforms, covering multi-core CPUs with vector units, GPUs, and hybrid systems. We demonstrate that a less efficient implementation from an algorithmical point of view can be beneficial if it allows vectorization and a higher degree of parallelism instead. Furthermore, we analyze the suitability of parallel programming languages for the implementation. Considering hardware, we restrict ourselves to the x86 platform with SSE and AVX vector extensions and to NVIDIA's Fermi architecture for GPUs. We consider both multi-core CPU and GPU architectures independently, as well as hybrid systems with up to 12 cores and 2 Fermi GPUs. With respect to parallel programming, we examine both the open standard OpenCL and Intel Array Building Blocks, a recently introduced high-level programming approach, and comment on their ease of use. As the baseline, we use the best results obtained with classically parallelized sparse grid algorithms and their OpenMP-parallelized intrinsics counterpart (SSE and AVX instructions), reporting both single and double precision measurements. The huge data sets we use are a real-life dataset stemming from astrophysics and artificial ones, all of which exhibit challenging properties. In all settings, we achieve excellent results, obtaining speedups of up to 188 x using single precision on a hybrid system.
机译:从庞大的数据集中获取知识是当今数据驱动应用程序的主要挑战。稀疏网格为数据挖掘中的分类和回归提供了一种数值方法,该方法只能线性缩放数据点的数量,因此非常适合大量数据。由于稀疏网格算法的递归性质及其经典的随机存储器访问模式,它们对诸如加速器之类的现代硬件体系结构的并行化提出了挑战。在本文中,我们介绍了几种当前任务并行和数据并行平台上的并行化,其中包括带有向量单元的多核CPU,GPU和混合系统。我们证明,从算法的角度来看,效率较低的实现可能会有益,如果它允许向量化和更高程度的并行性。此外,我们分析了并行编程语言对实现的适用性。考虑到硬件,我们将自己限制在具有SSE和AVX矢量扩展的x86平台以及NVIDIA用于GPU的Fermi架构上。我们独立考虑多核CPU和GPU架构,以及具有多达12个内核和2个Fermi GPU的混合系统。关于并行编程,我们研究了开放标准的OpenCL和英特尔Array Building Blocks(一种最近引入的高级编程方法),并对它们的易用性进行了评论。作为基准,我们使用经典并行稀疏网格算法及其OpenMP并行内在对应项(SSE和AVX指令)获得的最佳结果,报告单精度和双精度测量。我们使用的海量数据集是源自天体物理学和人工数据的真实数据集,所有这些数据集都具有挑战性。在所有设置下,我们都能获得出色的结果,在混合动力系统上使用单精度可实现高达188倍的加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号