首页> 外文期刊>IEEE computer architecture letters >TRiM: Tensor Reduction in Memory
【24h】

TRiM: Tensor Reduction in Memory

机译:修剪:记忆中的张量减少

获取原文
获取原文并翻译 | 示例
           

摘要

Personalized recommendation systems are gaining significant traction due to their industrial importance. An important building block of recommendation systems consists of what is known as the embedding layers, which exhibit a highly memory-intensive characteristics. Fundamental primitives of embedding layers are the embedding vector gathers followed by vector reductions, which exhibit low arithmetic intensity and becomes bottlenecked by the memory throughput. To address this issue, recent proposals in this research space employ a near-data processing (NDP) solution at the DRAM rank-level, achieving a significant performance speedup. We observe that prior NDP solutions based on rank-level parallelism leave significant performance left on the table, as they do not fully reap the abundant data transfer throughput inherent in DRAM datapaths. Based on the observation that the datapath of the DRAM has a hierarchical tree structure, we propose a novel, fine-grained NDP architecture for recommendation systems, which augments the DRAM datapath with an "in-DRAM" reduction unit at the DDR4/5 rank/bank-group/bank level, achieving significant performance improvements over state-of-the-art approaches. We also propose hot embedding-vector replication to alleviate the load imbalance across the reduction units.
机译:由于其工业重要性,个性化推荐系统正在获得显着的牵引力。推荐系统的一个重要构建块包括所谓的嵌入层,其表现出高度记忆密集的特性。嵌入层的基因基元是嵌入的向量收集,然后是向量减少,其表现出低算术强度,并通过内存吞吐量变得瓶颈。为了解决这个问题,该研究空间中最近的建议采用DRAM等级的近数据处理(NDP)解决方案,实现了显着的性能加速。我们观察到,基于秩级并行性的先前的NDP解决方案留下了剩余的表现,因为它们没有完全获取DRAM数据路径中固有的丰富数据传输吞吐量。基于DRAM的DataPath具有层次树结构的观察,我们提出了一种用于推荐系统的新型细粒度的NDP架构,其在DDR4 / 5等级中增加了DRAM DataPath的“In-DRAM”减少单元/银行集团/银行水平,实现了最先进的方法的重大绩效。我们还提出了热嵌入的矢量复制,以减轻减少单元的负载不平衡。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号