首页> 外文会议>International Symposium on Microarchitecture >Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology
【24h】

Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology

机译:AMBIT:用于使用商品DRAM技术的批量位运行的内存加速器

获取原文

摘要

Many important applications trigger bulk bitwise operations, i.e., bitwise operations on large bit vectors. In fact, recent works design techniques that exploit fast bulk bitwise operations to accelerate databases (bitmap indices, BitWeaving) and web search (BitFunnel). Unfortunately, in existing architectures, the throughput of bulk bitwise operations is limited by the memory bandwidth available to the processing unit (e.g., CPU, GPU, FPGA, processing-in-memory).To overcome this bottleneck, we propose Ambit, an Accelerator-in-Memory for bulk bitwise operations. Unlike prior works, Ambit exploits the analog operation of DRAM technology to perform bitwise operations completely inside DRAM, thereby exploiting the full internal DRAM bandwidth. Ambit consists of two components. First, simultaneous activation of three DRAM rows that share the same set of sense amplifiers enables the system to perform bitwise AND and OR operations. Second, with modest changes to the sense amplifier, the system can use the inverters present inside the sense amplifier to perform bitwise NOT operations. With these two components, Ambit can perform any bulk bitwise operation efficiently inside DRAM. Ambit largely exploits existing DRAM structure, and hence incurs low cost on top of commodity DRAM designs (1% of DRAM chip area). Importantly, Ambit uses the modern DRAM interface without any changes, and therefore it can be directly plugged onto the memory bus. Our extensive circuit simulations show that Ambit works as expected even in the presence of significant process variation. Averaged across seven bulk bitwise operations, Ambit improves performance by 32X and reduces energy consumption by 35X compared to state-of-the-art systems. When integrated with Hybrid Memory Cube (HMC), a 3D-stacked DRAM with a logic layer, Ambit improves performance of bulk bitwise operations by 9.7X compared to processing in the logic layer of the HMC. Ambit improves the performance of three real-world data-intensive applications, 1) database bitmap indices, 2) BitWeaving, a technique to accelerate database scans, and 3) bit-vector-based implementation of sets, by 3X-7X compared to a state-of-the-art baseline using SIMD optimizations. We describe four other applications that can benefit from Ambit, including a recent technique proposed to speed up web search. We believe that large performance and energy improvements provided by Ambit can enable other applications to use bulk bitwise operations.
机译:许多重要的应用程序触发大宗位运算,即大位向量位运算。事实上,最近的作品设计出利用快速批量位运算加速数据库(位图索引,BitWeaving)和网络搜索(BitFunnel)技术。不幸的是,在现有的结构中,散装逐位运算的吞吐量由提供给所述处理单元的存储器带宽(例如,CPU,GPU,FPGA,处理式存储器)。为了克服这一瓶颈,我们提出境界,加速器限于-in-存储器批量位运算。不同于现有的作品,境界利用DRAM技术的模拟操作,以完全执行位操作内部DRAM,从而充分利用全内部DRAM带宽。范围由两个部分组成。首先,那共享同一组感应放大器三个DRAM行同时启动,能够进行按位AND和OR运算系统。其次,适度调整到读出放大器,该系统可以使用目前的检测放大器内的逆变器进行按位非操作。与这两个部件,境界可以内部DRAM有效地执行任何散装按位操作。范围在很大程度上利用现有的DRAM的结构,因此成本低即被对商品的顶部DRAM设计(DRAM芯片面积的1%)。重要的是,境界采用现代DRAM接口没有任何变化,因此,可直接插入到所述存储器总线。我们广泛的电路仿真结果表明,境界的作品,即使在显著过程中变异的存在预期。在七个散装位运算平均,境界由32X提高性能,并与国家的最先进的系统通过35X降低了能耗。当与混合存储立方体(HMC)集成,与逻辑层的3D堆叠的DRAM,境界提高相比,在HMC的逻辑层处理由9.7倍散装逐位操作的性能。境界提高了三个真实世界的数据密集型应用程序的性能,1)数据库的位图索引,2)BitWeaving,技术加快扫描数据库,以及3)基于位的向量执行组,由3X-7X相比状态的最先进的基线使用SIMD优化。我们描述了可以从境界受益其他四个应用程序,包括提出要加快网络搜索最近的技术。我们相信,通过提供境界大的性能和能改进可以使其他应用程序来使用散装位操作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号