Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology

机译：AMBIT：用于使用商品DRAM技术的批量位运行的内存加速器

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many important applications trigger bulk bitwise operations, i.e., bitwise operations on large bit vectors. In fact, recent works design techniques that exploit fast bulk bitwise operations to accelerate databases (bitmap indices, BitWeaving) and web search (BitFunnel). Unfortunately, in existing architectures, the throughput of bulk bitwise operations is limited by the memory bandwidth available to the processing unit (e.g., CPU, GPU, FPGA, processing-in-memory).To overcome this bottleneck, we propose Ambit, an Accelerator-in-Memory for bulk bitwise operations. Unlike prior works, Ambit exploits the analog operation of DRAM technology to perform bitwise operations completely inside DRAM, thereby exploiting the full internal DRAM bandwidth. Ambit consists of two components. First, simultaneous activation of three DRAM rows that share the same set of sense amplifiers enables the system to perform bitwise AND and OR operations. Second, with modest changes to the sense amplifier, the system can use the inverters present inside the sense amplifier to perform bitwise NOT operations. With these two components, Ambit can perform any bulk bitwise operation efficiently inside DRAM. Ambit largely exploits existing DRAM structure, and hence incurs low cost on top of commodity DRAM designs (1% of DRAM chip area). Importantly, Ambit uses the modern DRAM interface without any changes, and therefore it can be directly plugged onto the memory bus. Our extensive circuit simulations show that Ambit works as expected even in the presence of significant process variation. Averaged across seven bulk bitwise operations, Ambit improves performance by 32X and reduces energy consumption by 35X compared to state-of-the-art systems. When integrated with Hybrid Memory Cube (HMC), a 3D-stacked DRAM with a logic layer, Ambit improves performance of bulk bitwise operations by 9.7X compared to processing in the logic layer of the HMC. Ambit improves the performance of three real-world data-intensive applications, 1) database bitmap indices, 2) BitWeaving, a technique to accelerate database scans, and 3) bit-vector-based implementation of sets, by 3X-7X compared to a state-of-the-art baseline using SIMD optimizations. We describe four other applications that can benefit from Ambit, including a recent technique proposed to speed up web search. We believe that large performance and energy improvements provided by Ambit can enable other applications to use bulk bitwise operations.

机译：许多重要的应用程序触发大宗位运算，即大位向量位运算。事实上，最近的作品设计出利用快速批量位运算加速数据库（位图索引，BitWeaving）和网络搜索（BitFunnel）技术。不幸的是，在现有的结构中，散装逐位运算的吞吐量由提供给所述处理单元的存储器带宽（例如，CPU，GPU，FPGA，处理式存储器）。为了克服这一瓶颈，我们提出境界，加速器限于-in-存储器批量位运算。不同于现有的作品，境界利用DRAM技术的模拟操作，以完全执行位操作内部DRAM，从而充分利用全内部DRAM带宽。范围由两个部分组成。首先，那共享同一组感应放大器三个DRAM行同时启动，能够进行按位AND和OR运算系统。其次，适度调整到读出放大器，该系统可以使用目前的检测放大器内的逆变器进行按位非操作。与这两个部件，境界可以内部DRAM有效地执行任何散装按位操作。范围在很大程度上利用现有的DRAM的结构，因此成本低即被对商品的顶部DRAM设计（DRAM芯片面积的1％）。重要的是，境界采用现代DRAM接口没有任何变化，因此，可直接插入到所述存储器总线。我们广泛的电路仿真结果表明，境界的作品，即使在显著过程中变异的存在预期。在七个散装位运算平均，境界由32X提高性能，并与国家的最先进的系统通过35X降低了能耗。当与混合存储立方体（HMC）集成，与逻辑层的3D堆叠的DRAM，境界提高相比，在HMC的逻辑层处理由9.7倍散装逐位操作的性能。境界提高了三个真实世界的数据密集型应用程序的性能，1）数据库的位图索引，2）BitWeaving，技术加快扫描数据库，以及3）基于位的向量执行组，由3X-7X相比状态的最先进的基线使用SIMD优化。我们描述了可以从境界受益其他四个应用程序，包括提出要加快网络搜索最近的技术。我们相信，通过提供境界大的性能和能改进可以使其他应用程序来使用散装位操作。

著录项

来源
《International Symposium on Microarchitecture》|2017年|xix 825 p. :|共15页
会议地点
作者
Vivek Seshadri; Donghyuk Lee; Thomas Mullins; Hasan Hassan; Amirali Boroumand; Jeremie Kim; Michael A. Kozuch; Onur Mutlu; Phillip B. Gibbons; Todd C. Mowry;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP302-532;
关键词
DRAM chips; power aware computing;

机译：DRAM芯片;动力感知计算;

相似文献

外文文献
中文文献
专利

1. In-Memory Low-Cost Bit-Serial Addition Using Commodity DRAM Technology [J] . Ali Mustafa E., Jaiswal Akhilesh, Roy Kaushik Circuits and Systems I: Regular Papers, IEEE Transactions on . 2020,第1期

机译：使用商品DRAM技术的内存低成本比特串行添加
2. In-Memory Processing Paradigm for Bitwise Logic Operations in STT–MRAM [J] . Wang Kang, Haotian Wang, Zhaohao Wang, IEEE Transactions on Magnetics . 2017,第11期

机译：STT–MRAM中按位逻辑运算的内存处理范例
3. Fast Bulk Bitwise AND and OR in DRAM [J] . Seshadri Vivek, Hsieh Kevin, Boroumand Amirali, Computer Architecture Letters . 2015,第2期

机译：DRAM中的快速批量按位AND和OR
4. Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology [C] . Vivek Seshadri, Donghyuk Lee, Thomas Mullins, Annual IEEE/ACM International Symposium on Microarchitecture . 2017

机译：范围：使用商品DRAM技术的大容量按位运算的内存加速器
5. Application of technology insertion to particle accelerator modernization and operations support [D] . Lind, Peter Christian 1998

机译：技术插入在粒子加速器现代化和运营支持中的应用
6. SparkBLAST: scalable BLAST processing using in-memory operations [O] . Marcelo Rodrigo de Castro, Catherine dos Santos Tostes, Alberto M. R. Dávila, 2017

机译：SparkBLAST：使用内存操作的可伸缩BLAST处理
7. Fast Bulk Bitwise AND and OR in DRAM [O] . Seshadri Vivek, Hsieh Kevin, Boroumand Amirali, 2015

机译：DRAM中的快速批量按位AND和OR

Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology

摘要

著录项

相似文献

相关主题

期刊订阅