Refine and Recycle: A Method to Increase Decompression Parallelism

机译：提炼和回收利用：一种增加减压并行性的方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Rapid increases in storage bandwidth, combined with a desire for operating on large datasets interactively, drives the need for improvements in high-bandwidth decompression. Existing designs either process only one token per cycle or process multiple tokens per cycle with low area efficiency and/or low clock frequency. We propose two techniques to achieve high single-decoder throughput at improved efficiency by keeping only a single copy of the history data across multiple BRAMs and operating on each BRAM independently. A first stage efficiently refines the tokens into commands that operate on a single BRAM and steers the commands to the appropriate one. In the second stage, a relaxed execution model is used where each BRAM command executes immediately and those with invalid data are recycled to avoid stalls caused by the read-after-write dependency. We apply these techniques to Snappy decompression and implement a Snappy decompression accelerator on a CAPI2-attached FPGA platform equipped with a Xilinx VU3P FPGA. Experimental results show that our proposed method achieves up to 7.2 GB/s output throughput per decompressor, with each decompressor using 14.2% of the logic and 7% of the BRAM resources of the device. Therefore, a single decompressor can easily keep pace with an NVMe device (PCIe Gen3 x4) on a small FPGA, while a larger device, integrated on a host bridge adapter and instantiating multiple decompressors, can keep pace with the full OpenCAPI 3.0 bandwidth of 25 GB/s.

机译：存储带宽的快速增加，以及对交互式操作大型数据集的需求，推动了对高带宽解压缩的改进需求。现有设计要么每个周期仅处理一个令牌，要么每个周期以低区域效率和/或低时钟频率处理多个令牌。我们提出了两种技术，以通过在多个BRAM上仅保留历史数据的单个副本并在每个BRAM上独立运行来以提高的效率实现高单解码器吞吐量。第一阶段有效地将令牌精炼为可在单个BRAM上运行的命令，并将这些命令引导至适当的命令。在第二阶段中，使用宽松的执行模型，其中每个BRAM命令立即执行，具有无效数据的命令将被回收，以避免写入后读取相关性引起的停顿。我们将这些技术应用于Snappy解压缩，并在配备有Xilinx VU3P FPGA的CAPI2连接的FPGA平台上实现Snappy解压缩加速器。实验结果表明，我们提出的方法每个解压缩器可实现高达7.2 GB / s的输出吞吐量，每个解压缩器均使用设备的14.2％的逻辑和7％的BRAM资源。因此，单个解压缩器可以轻松地与小型FPGA上的NVMe设备（PCIe Gen3 x4）保持同步，而集成在主机桥适配器上并实例化多个解压缩器的较大设备可以与完整的OpenCAPI 3.0带宽保持同步（25）。 GB /秒。

著录项

来源
《International Conference on Application-specific Systems, Architectures and Processors》|2019年|272-280|共9页
会议地点
作者
Jian Fang; Jianyu Chen; Jinho Lee; Zaid Al-Ars; H.Peter Hofstee;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
History; Field programmable gate arrays; Lenses; Parallel processing; Throughput; Recycling; Compression algorithms;

机译：历史;现场可编程门阵列;透镜;并行处理;吞吐量;回收利用;压缩算法;

相似文献

外文文献
中文文献
专利

1. Increased space-parallelism via time-simultaneous Newton-multigrid methods for nonstationary nonlinear PDE problems [J] . Jonas Dünnebacke, Stefan Turek, Christoph Lohmann, International Journal of High Performance Computing Applications . 2021,第3期

机译：通过时间同学牛顿 - 多重资源以非间断非线性PDE问题增加空间并行性
2. GAS REMOVAL AND INCREASE OF PURITY BY CONVENTIONAL METHODS OF STEEL LADLE REFINING IN SECONDARY TREATMENT [J] . Valentin MINCU, Sinziana ITTU Metalurgia . 2012,第5期

机译：常规处理中钢包精炼的常规方法去除气体和提高纯度
3. Optimizing the mechanical properties of papers reinforced with refining and layer-by-layer treated recycled fibers using response surface methodology [J] . Hamidreza Rudi, Payam Ghorbannazhad, Martin A. Hubbe Carbohydrate Polymers: Scientific and Technological Aspects of Industrially Important Polysaccharides . 2018,第期

机译：使用响应表面方法优化用精制和逐层加固的纸张的力学性能
4. Refine and Recycle: A Method to Increase Decompression Parallelism [C] . Jian Fang, Jianyu Chen, Jinho Lee, International Conference on Application-specific Systems, Architectures and Processors . 2019

机译：改进和回收：一种增加减压并行性的方法
5. Novel process for recycling magnesium alloy employing refining and solid oxide membrane electrolysis [D] . Guan, Xiaofei 2013

机译：精炼和固体氧化物膜电解回收镁合金的新工艺
6. Utilizing Recyclable Task-Specific Ionic Liquid for Selective Leaching and Refining of Scandium from Bauxite Residue [O] . Eleni Mikeli, Efthimios Balomenos, Dimitrios Panias 2021

机译：利用可回收的任务特异性离子液体用于选择性浸出和从铝土矿残留钪的钪
7. A Method to Increase Current Density in a Mono Element Internal Tin Processed Superconductor Utilizing Zr Oxide to Refine Grain Size [O] . Bruce A. Zeitlin, Eric Gregory 2008

机译：一种利用Zr氧化物细化晶粒尺寸提高单元素内锡工艺超导体电流密度的方法

Refine and Recycle: A Method to Increase Decompression Parallelism

摘要

著录项

相似文献

相关主题

期刊订阅