...
首页> 外文期刊>Circuits and Systems for Video Technology, IEEE Transactions on >Memory Allocation Exploiting Temporal Locality for Reducing Data-Transfer Bottlenecks in Heterogeneous Multicore Processors
【24h】

Memory Allocation Exploiting Temporal Locality for Reducing Data-Transfer Bottlenecks in Heterogeneous Multicore Processors

机译:内存分配利用时间局部性来减少异构多核处理器中的数据传输瓶颈

获取原文
获取原文并翻译 | 示例

摘要

High performance and low-power very large-scale integrations are required to implement complex media processing applications on mobile devices. Heterogeneous multicore processors are a promising way to achieve this objective. They contain multiple accelerator cores and CPU cores to increase the processing speed. Since media processing applications access a huge amount of data, fast address generation is very important. To increase the address generation speed, accelerator cores contain address generation units (AGUs). To reduce the power consumption, the AGUs have limited hardware resources such as adders and counters. Therefore, the AGUs generate simple addressing patterns where the address increases linearly in each clock cycle. Media processing applications frequently encounter addressing patterns where the same data are accessed in different time slots. To implement such addressing patterns, the same data have to be allocated into multiple memory addresses in such a way that those addresses can be generated by the AGUs. Allocation of the same data in multiple addresses is called the “data-duplication.” The data-duplication increases the data-transfer time and also the total processing time significantly. To remove such data-transfer bottlenecks, this paper proposes a memory allocation method that exploits the temporal and spatial locality of the memory access in media processing applications. We evaluate the proposed method using media processing applications to validate its effectiveness. According to the results, the proposed method reduces the total processing time by 14% to more than 85% compared to previous works.
机译:需要高性能和低功耗的大规模集成,才能在移动设备上实现复杂的媒体处理应用程序。异构多核处理器是实现此目标的一种有前途的方法。它们包含多个加速器内核和CPU内核,以提高处理速度。由于媒体处理应用程序访问大量数据,因此快速地址生成非常重要。为了提高地址生成速度,加速器内核包含地址生成单元(AGU)。为了降低功耗,AGU具有有限的硬件资源,例如加法器和计数器。因此,AGU生成简单的寻址模式,其中地址在每个时钟周期内线性增加。媒体处理应用程序经常遇到寻址模式,其中在不同的时隙中访问相同的数据。为了实现这种寻址模式,必须将相同的数据分配到多个内存地址中,以使AGU可以生成这些地址。在多个地址中分配相同的数据称为“数据复制”。数据复制显着增加了数据传输时间,也增加了总处理时间。为了消除这种数据传输瓶颈,本文提出了一种内存分配方法,该方法利用了媒体处理应用程序中内存访问的时间和空间局部性。我们使用媒体处理应用程序评估提出的方法,以验证其有效性。根据结果​​,与以前的工作相比,该方法将总处理时间减少了14%,超过了85%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号