首页> 外文会议>Design, Automation Test in Europe Conference Exhibition >Practical Challenges in Delivering the Promises of Real Processing-in-Memory Machines
【24h】

Practical Challenges in Delivering the Promises of Real Processing-in-Memory Machines

机译:提供实际加工内存机器承诺的实际挑战

获取原文

摘要

Processing-in-Memory (PiM) machines promise to overcome the von Neumann bottleneck in order to further scale performance and energy efficiency of computing systems by reducing the extent of data transfer and offering ample parallelism. In this paper, we take the memristive Memory Processing Unit (mMPU) as a case study of a PiM machine and scrutinize it in practical scenarios. Specifically, we explore the limitations of parallelism and data transfer elimination. We argue that lack of operand locality and arrangement might make data transfer inevitable in the mMPU. We then devise techniques to move data within the mMPU, without transferring it off-chip, and quantify their costs. Additionally, we present electrical parameters that might limit the parallelism offered by the mMPU and evaluate their impact. Using benchmarks from the LGsynth91 suite, their vector extensions, and a few synthetic data-parallel workloads, we show that the internal data transfer results in an increase of up to 1.5× in the execution time, while the parallelism can be limited in some cases to 256 gates, resulting in an increase in execution time by 1.1× to 2×.
机译:加工内存(PIM)机器承诺克服von Neumann瓶颈,以便通过降低数据传输的程度和提供充足的平行度来进一步缩放计算系统的性能和能量效率。在本文中,我们将忆内记忆处理单元(MMPU)作为PIM机器的案例研究,并在实际情况下仔细审查。具体而言,我们探讨了并行性和数据传输消除的局限性。我们认为,缺乏操作数目和安排可能会使MMPU中的数据传输不可避免。然后,我们设计了在MMPU内移动数据的技术,而无需将其转移,并量化其成本。此外,我们呈现可能限制MMPU提供的并行性的电气参数,并评估其影响。使用来自LGSynth91套件的基准,他们的矢量扩展和一些合成数据并行工作负载,我们表明,内部数据传输在执行时间内增加到最高1.5倍,而在某些情况下,并行机可受到限制到256个门,导致执行时间的增加1.1×至2×。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号