首页> 外文学位 >Supporting Applications Involving Dynamic Data Structures and Irregular Memory Access on Emerging Parallel Platforms.
【24h】

Supporting Applications Involving Dynamic Data Structures and Irregular Memory Access on Emerging Parallel Platforms.

机译:在新兴的并行平台上支持涉及动态数据结构和不规则内存访问的应用程序。

获取原文
获取原文并翻译 | 示例

摘要

SIMD accelerators and many-core coprocessors with coarse-grained and fine-grained level parallelism, become more and more popular. Streaming SIMD Extensions (SSE), Graphics Processing Unit (GPU), and Intel Xeon Phi (MIC) can provide orders of magnitude better performance and efficiency for parallel workloads as compared to single core CPUs. However, parallelizing irregular applications involving dynamic data structures and irregular memory access on these parallel platforms is not straightforward due to their intensive control-flow dependency and lack of memory locality. Our efforts focus on three classes of irregular applications: Irregular Trees and Graphs Traversals, Irregular Reductions and Dynamic Allocated Arrays and Lists, and explore the mechanism of parallelizing them on various parallel architectures from both fine-grained and coarse-grained perspectives.;We first focus on the traversal of irregular trees and graphs , more specifically, a class of applications involving the traversal of many pointer-intensive data structures, e.g. Random Forest, and Regular Expressions, on various fine-grained SIMD architectures, e.g. the Streaming SIMD Extension (SSE), and Graphic Processing Unit (GPU). We address this problem by developing an intermediate language for specifying such traversals, followed by a run-time scheduler that maps traversals to SIMD units. A key idea in our run-time scheme is converting branches to arithmetic operations, which then allows us to use SIMD hardware.;However, different SIMD architectures have different features, so a significant challenge to our previous work is to automatically optimize applications for various architectures, i.e., we need to implement performance portability . Moreover, one of the first architectural features programmers look to when optimizing their applications is the memory hierarchy. Thus, we design a portable optimization engine for accelerating irregular data traversal applications on various SIMD architectures by emphasizing on improving the data locality and hiding memory latency.;We next explore the possibility of efficiently parallelizing two irregular reduction applications on Intel Xeon Phi architecture, an emerging many-core coprocessor architecture with long SIMD vectors, via data layout optimization. During this process, we also identify a general data management problem in the CPU-Coprocessor programming model, i.e., the problem of automating and optimizing dynamic allocated data structures transfers between CPU and Coprocessors. For dynamic multi-dimensional arrays, we design a set of compile-time solutions involving heap layout transformation, while for other irregular data structures such as linked lists, we improve the existing shared memory runtime solution to reduce the transfer costs.;Dynamic allocated data structures like List are also commonly used in high-level programming languages, such as Python to support dynamic, flexible features to increase the programming productivity. To parallelize applications in such high-level programming languages on both coarse-grained and fine-grained parallel platforms, we design a compilation system linearizing dynamic data structures into arrays, and invoking low level multi-core, many-core libraries. A critical issue of our linearization method is that it incurs extra data structure transformation overhead, especially for the irregular data structures not reused frequently. To address this challenge, we design a set of transformation optimization algorithms including an inter-procedural Partial Redundancy Elimination (PRE) algorithm to minimize the data transformation overhead automatically.
机译:具有粗粒度和细粒度级别并行性的SIMD加速器和多核协处理器变得越来越流行。与单核CPU相比,流SIMD扩展(SSE),图形处理单元(GPU)和英特尔至强融核(MIC)可以为并行工作负载提供更好的性能和效率。但是,在并行平台上并行化涉及动态数据结构和不规则内存的不规则应用程序并不容易,因为它们对控制流的依赖性很强并且缺少内存局部性。我们的工作集中在三类不规则应用程序上:不规则树和图形遍历,不规则归约和动态分配数组和列表,并从细粒度和粗粒度的角度探讨将它们并行化为各种并行体系结构的机制。专注于遍历不规则树和图形,更具体地说,涉及遍历许多指针密集型数据结构的一类应用程序,例如各种细粒度SIMD架构上的随机森林和正则表达式,例如流SIMD扩展(SSE)和图形处理单元(GPU)。我们通过开发一种用于指定此类遍历的中间语言来解决此问题,然后使用运行时调度程序将遍历映射到SIMD单元。我们的运行时方案中的一个关键思想是将分支转换为算术运算,然后使我们可以使用SIMD硬件。但是,不同的SIMD架构具有不同的功能,因此,我们以前的工作面临的一项重大挑战是自动优化各种应用程序架构,即我们需要实现性能可移植性。此外,程序员在优化其应用程序时首先想到的体系结构功能之一就是内存层次结构。因此,我们着重设计了一个便携式优化引擎,通过着重于改善数据局部性和隐藏内存等待时间来加速各种SIMD架构上的不规则数据遍历应用程序;接下来,我们探讨了在英特尔至强融核架构上有效并行化两个不规则归约应用程序的可能性。通过数据布局优化,新兴的具有长SIMD向量的多核协处理器架构。在此过程中,我们还确定了CPU-协处理器编程模型中的一般数据管理问题,即自动化和优化CPU和协处理器之间动态分配的数据结构传输的问题。对于动态多维数组,我们设计了一组涉及堆布局转换的编译时解决方案,而对于其他不规则数据结构(如链表),我们改进了现有的共享内存运行时解决方案以降低传输成本。诸如List之类的结构也普遍用于高级编程语言(如Python)中,以支持动态,灵活的功能以提高编程效率。为了在粗粒度和细粒度并行平台上以这种高级编程语言对应用程序进行并行化,我们设计了一个编译系统,将动态数据结构线性化为数组,并调用低级多核,多核库。我们的线性化方法的一个关键问题是,这会导致额外的数据结构转换开销,尤其是对于不经常重复使用的不规则数据结构。为了解决这一挑战,我们设计了一套转换优化算法,其中包括一个过程间部分冗余消除(PRE)算法,以自动最小化数据转换开销。

著录项

  • 作者

    Ren, Bin.;

  • 作者单位

    The Ohio State University.;

  • 授予单位 The Ohio State University.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2014
  • 页码 211 p.
  • 总页数 211
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号