【24h】

Efficient complex operators for irregular codes

机译:用于不规则代码的高效复杂运算符

获取原文

摘要

Complex “fat operators” are important contributors to the efficiency of specialized hardware. This paper introduces two new techniques for constructing efficient fat operators featuring up to dozens of operations with arbitrary and irregular data and memory dependencies. These techniques focus on minimizing critical path length and load-use delay, which are key concerns for irregular computations. Selective Depipelining(SDP) is a pipelining technique that allows fat operators containing several, possibly dependent, memory operations. SDP allows memory requests to operate at a faster clock rate than the datapath, saving power in the datapath and improving memory performance. Cachelets are small, customized, distributed L0 caches embedded in the datapath to reduce load-use latency. We apply these techniques to Conservation Cores(c-cores) to produce coprocessors that accelerate irregular code regions while still providing superior energy efficiency. On average, these enhanced c-cores reduce EDP by 2× and area by 35% relative to c-cores. They are up to 2.5× faster than a general-purpose processor and reduce energy consumption by up to 8× for a variety of irregular applications including several SPECINT benchmarks.
机译:复杂的“胖操作员”是专业硬件效率的重要贡献者。本文介绍了两个用于构建高效胖操作员的新技术,具有多达数十种操作,具有任意和不规则的数据和内存依赖性。这些技术侧重于最小化关键路径长度和负载使用延迟,这是对不规则计算的关键问题。选择性低层线(SDP)是一种流水线技术,允许包含多种,可能依赖的内存操作的脂肪运算符。 SDP允许内存请求以比数据路径更快的时钟速率运行,节省数据路径中的电源并提高内存性能。 Cachelets是小型,定制的,分布式L0缓存嵌入在数据路径中以减少负载使用延迟。我们将这些技术应用于保护核心(C-CORES)以产生加速不规则代码区域的协处理器,同时仍然提供卓越的能效。平均而言,这些增强的C-CORES相对于C-Cores将EDP和面积为35%。它们比通用处理器快于2.5倍,并将能耗降低至8倍,可用于各种不规则的应用,包括多种Specint基准。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号