首页> 外文会议>International conference on Parallel architectures and compilation techniques >Two-level mapping based cache index selection for packet forwarding engines
【24h】

Two-level mapping based cache index selection for packet forwarding engines

机译:用于数据包转发引擎的基于两级映射的缓存索引选择

获取原文

摘要

Packet forwarding is a memory-intensive application requiring multiple accesses through a trie structure. The efficiency of a cache for this application critically depends on the placement function to reduce conflict misses. Traditional placement functions use a one-level mapping that naively partitions trie-nodes into cache sets. However, as a significant percentage of trie nodes are not useful, these schemes suffer from a non-uniform distribution of useful nodes to sets. This in turn results in increased conflict misses. Newer organizations such as variable associativity caches achieve flexibility in placement at the expense of increased hit-latency. This makes them unsuitable for L1 caches. We propose a novel two-level mapping framework that retains the hit-latency of one-level mapping yet incurs fewer conflict misses. This is achieved by introducing a second-level mapping which reorganizes the nodes in the naive initial partitions into refined partitions with near-uniform distribution of nodes. Further as this remapping is accomplished by simply adapting the index bits to a given routing table the hit-latency is not affected. We propose three new schemes which result in up to 16% reduction in the number of misses and 13% speedup in memory access time. In comparison, an XOR-based placement scheme known to perform extremely well for general purpose architectures, can obtain up to 2% speedup in memory access time.
机译:数据包转发是一种内存密集型应用程序,需要通过特里结构进行多次访问。此应用程序的缓存效率主要取决于放置功能,以减少冲突遗漏。传统的放置功能使用一个一级映射,该映射将Trie节点天真地划分为缓存集。但是,由于很大比例的Trie节点不可用,因此这些方案会受到有用节点到集合的不均匀分配的困扰。反过来,这会导致冲突遗漏的增加。诸如可变关联性缓存之类的新组织以增加的命中等待时间为代价实现了灵活性的放置。这使它们不适合L1缓存。我们提出了一种新颖的两级映射框架,该框架保留了单级映射的命中延迟,但产生的冲突遗漏更少。这是通过引入第二级映射实现的,该映射将朴素的初始分区中的节点重新组织为具有接近均匀分布的节点的精炼分区。此外,由于该重新映射是通过简单地使索引位适应给定的路由表来完成的,因此命中延迟不受影响。我们提出了三种新方案,这些方案最多可将未命中次数减少16%,并将内存访问时间加快13%。相比之下,基于XOR的布局方案在通用体系结构中表现出色,可以将内存访问时间提高多达2%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号