首页> 外文OA文献 >Improving memory access performance for irregular algorithms in heterogeneous CPU/FPGA systems
【2h】

Improving memory access performance for irregular algorithms in heterogeneous CPU/FPGA systems

机译:提高异构CpU / FpGa系统中不规则算法的内存访问性能

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Many algorithms and applications in scientific computing exhibit irregular access patterns as consecutive accesses are dependent on the structure of the data being processed and as such cannot be known a priori. This manifests itself as a lack of temporal and spatial locality meaning these applications often perform poorly in traditional processor cache hierarchies. This thesis demonstrates that heterogeneous architectures containing Field Programmable Gate Arrays (FPGAs) alongside traditional processors can improve memory access throughput by 2-3x by using the FPGA to insert data directly into the processor cache, eliminating costly cache misses.ududWhen fetching data to be processed directly on the FPGA, scatter-gather DirectudMemory Access (DMA) provides the best performance but its storage format is inefficient for these classes of applications. The presented optimised storage and generation of these descriptors on-demand leads to a 16x reduction in on-chip BlockudRAM usage and a 2/3 reduction in data transfer time.ududTraditional scatter-gather DMA requires a statically defined list of access instructions and is managed by a host processor. The system presented in this thesis expands the DMA operation to allow data-driven memory requests in response to processed data and brings all control on-chip allowing autonomous operation. This dramatically increases system flexibility and provides a further 11% performance improvement.udGraph applications and algorithms for traversing and searching graph data are used throughout this thesis as a motivating example for the optimisations presented, though they should be equally applicable to a wide range of irregular applications within scientific computing.
机译:科学计算中的许多算法和应用程序都显示出不规则的访问模式,因为连续访问取决于正在处理的数据的结构,因此无法事先知道。这表现为缺乏时间和空间局部性,这意味着这些应用程序在传统的处理器缓存层次结构中通常表现不佳。本文证明,包含现场可编程门阵列(FPGA)和传统处理器的异构体系结构可以通过使用FPGA直接将数据插入处理器缓存中而将内存访问吞吐量提高2-3倍,从而避免了代价高昂的缓存遗漏。 ud ud为了直接在FPGA上进行处理,分散收集Direct udMemory Access(DMA)提供了最佳性能,但其存储格式对于此类应用程序效率不高。按需提供的这些描述符的优化存储和生成可导致片上Block udRAM的使用减少16倍,数据传输时间减少2/3。 ud ud传统的散布聚集DMA需要静态定义的列表访问指令,由主机处理器管理。本文提出的系统扩展了DMA操作,以允许响应处理后的数据的数据驱动的内存请求,并使所有控制都在芯片上,从而允许自主操作。这极大地提高了系统灵活性,并进一步提高了11%的性能。 udGraph应用程序和算法用于遍历和搜索图形数据,在整个论文中被用作激励性示例,说明了所进行的优化,尽管它们应同样适用于各种科学计算中的不规则应用。

著录项

  • 作者

    Bean Andrew;

  • 作者单位
  • 年度 2016
  • 总页数
  • 原文格式 PDF
  • 正文语种
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号