首页> 外文会议>International Conference on Parallel Processing >Parallel Radix Sort on the AMD Fusion Accelerated Processing Unit
【24h】

Parallel Radix Sort on the AMD Fusion Accelerated Processing Unit

机译:AMD Fusion加速处理单元上的并行基数排序

获取原文

摘要

We design, implement and evaluate a parallel radix sort that simultaneously utilizes the CPU and GPU devices on the AMD Fusion APU. The parallel sort, referred to as Fusion Sort, partitions the sort keys between the CPU and GPU devices and utilizes the integrated memory system of the APU to avoid data copying between the devices. We identify three design issues that impact overhead and performance: the granularity of sharing between the two devices, the scheme of data partitioning and the allocation of data in memory regions accessible by each device. We present three variants of Fusion Sort that share data at coarse and fine granularities and with fixed and variable data partitioning schemes. In each variant, data is allocated to minimize the overhead of non-preferred memory accesses of each device. Our evaluation shows that fine-grain sharing with variable data partitioning performs the best. Further, Fusion Sort outperforms CPU-only and GPU-only parallel radix sorts by up to 1.8X and 1.9X respectively. These results demonstrate the viability of the integrated memory system of the APU in the context of sorting.
机译:我们设计,实现和评估并行基数排序,同时使用AMD Fusion APU上的CPU和GPU设备。并行排序称为Fusion Sort,它在CPU和GPU设备之间划分排序键,并利用APU的集成内存系统来避免设备之间的数据复制。我们确定了影响开销和性能的三个设计问题:两个设备之间共享的粒度,数据分区方案以及每个设备可访问的内存区域中的数据分配。我们介绍了Fusion Sort的三个变体,它们以粗粒度和细粒度以及固定和可变数据分区方案共享数据。在每个变体中,分配数据以最小化每个设备的非首选内存访问的开销。我们的评估表明,使用可变数据分区进行细粒度共享的效果最佳。此外,Fusion Sort的性能优于纯CPU和纯GPU的并行基数排序分别高达1.8倍和1.9倍。这些结果证明了APU集成存储系统在分类​​中的可行性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号