首页> 外文学位 >Runtime pipeline I/O scheduling system for GPU-based heterogeneous architectures.
【24h】

Runtime pipeline I/O scheduling system for GPU-based heterogeneous architectures.

机译:用于基于GPU的异构体系结构的运行时管道I / O调度系统。

获取原文
获取原文并翻译 | 示例

摘要

Heterogeneous architectures can improve the performance of applications with computationally intensive operations. Even when these architectures may reduce the execution time of applications, there are opportunities for additional performance improvement as the memory hierarchies of the central processor cores and the coprocessor cores are separate. Applications running on heterogeneous architectures where graphics processing units (GPUs) execute throughput-intense, data-parallel operations may run in a single address space provided by unified virtual addressing or expand the upper bounds of scalability and high performance computing by explicitly partitioning and transferring data across orthogonal host and device address spaces. For explicit handling, applications must allocate space in the GPU global memory, copy input data, invoke kernels, and copy results to the CPU memory. By overlapping inter-memory data transfers and GPU computation steps, applications may further reduce execution time. This research presents a software architecture with a runtime pipeline for GPU input/output scheduling that acts as a bidirectional interface between the GPU computing application and the physical device. The main aim of this system is to reduce the impact of the processor-memory performance gap by exploiting device I/O and computation overlap. Evaluation using application benchmarks shows processing improvements with speedups up to 2.37x with respect to baseline, non-streamed GPU execution. In addition, the presented input/output scheduling system is a high-level, systems abstraction that removes application software complexity while exploiting the input/output and processing concurrency capabilities of the underlying GPU.
机译:异构体系结构可以通过计算密集型操作来提高应用程序的性能。即使这些体系结构可以减少应用程序的执行时间,但由于中央处理器内核和协处理器内核的存储器层次结构是分开的,因此仍有机会进一步提高性能。在图形处理单元(GPU)执行吞吐量密集,数据并行操作的异构体系结构上运行的应用程序可以在统一虚拟寻址提供的单个地址空间中运行,也可以通过显式分区和传输数据来扩展可扩展性和高性能计算的上限跨正交的主机和设备地址空间。为了进行显式处理,应用程序必须在GPU全局内存中分配空间,复制输入数据,调用内核并将结果复制到CPU内存。通过重叠内存间数据传输和GPU计算步骤,应用程序可以进一步减少执行时间。这项研究提出了一种具有用于GPU输入/输出调度的运行时管道的软件体系结构,该流水线充当GPU计算应用程序与物理设备之间的双向接口。该系统的主要目的是通过利用设备I / O和计算重叠来减少处理器内存性能差距的影响。使用应用基准测试进行评估显示,相对于基准,非流式GPU执行,处理速度提高了2.37倍。另外,所提出的输入/输出调度系统是一种高级的系统抽象,它消除了应用软件的复杂性,同时利用了底层GPU的输入/输出和处理并发功能。

著录项

  • 作者

    Olaya Builes, Julio Cesar.;

  • 作者单位

    The University of Texas at El Paso.;

  • 授予单位 The University of Texas at El Paso.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2014
  • 页码 130 p.
  • 总页数 130
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 语言学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号