PCIe is a hardware interface used in highperforming applications to move data from a central host and memory system to an accelerator such as a GPU or FPGA. In many memory bound applications, PCIe represents a bottleneck which limits the possible acceleration. In this paper, an opensource PCIe core is extended with a transparent layer of hardware compression/decompression with low latency and high throughput. The compressor/decompressor hardware operates on data values that match the width of the hardware interface and can be scaled up to higher parallelism. The results show an energy reduction of up to 84% in the PCIe transfers and up to 20% in the whole processing chain, thanks to the reduction in the number of bits that need to be moved over the power hungry wires that connect the main memory system to the accelerator in both directions. The overhead in terms of latency is maintained to a minimum and user selectable depending on the tolerances of the intended application.
展开▼