首页> 外文会议>Proceedings of the 2011 ACM/SIGDA international symposium on field programmable gate arrays. >Building a Multi-FPGA Virtualized Restricted Boitzmann Machine Architecture Using Embedded MPI
【24h】

Building a Multi-FPGA Virtualized Restricted Boitzmann Machine Architecture Using Embedded MPI

机译:使用嵌入式MPI构建多FPGA虚拟化受限Boitzmann机器架构

获取原文
获取原文并翻译 | 示例

摘要

Several FPGA architectures exist for accelerating Restricted Boitzmann Machines (RBMs). However, the network size for most is limited by the amount of available on-chip memory. Therefore, many FPGAs are required to implement very large networks for use in real-world applications. A virtualized design is able to time-multiplex the hardware resources and handle much larger networks but suffers a performance penalty due to the context switch. In this paper, we present a number of improvements to a virtualized FPGA architecture for RBMs. First, we take advantage of 16-bit arithmetic to pack larger networks onto a chip. Second, a custom DMA engine is designed to reduce the performance impact of the large amount of memory transactions. Finally, the architecture is scaled to multiple FPGAs to gain additional performance through coarse grain parallelism. The design effort required to implement these changes is minimized through the use of an embedded MPI framework. The architecture, tested on a Berkeley Emulation Engine 3 platform running at 100 Mhz, achieves a speed of 12.563 GCUPS on a 8192x8192 network.
机译:存在几种用于加速受限Boitzmann机(RBM)的FPGA体系结构。但是,大多数网络的大小受片上可用内存量的限制。因此,需要许多FPGA来实现非常大的网络以用于实际应用中。虚拟化设计能够对硬件资源进行时分复用并处理更大的网络,但由于上下文切换而导致性能下降。在本文中,我们提出了针对RBM的虚拟FPGA架构的许多改进。首先,我们利用16位算法将较大的网络打包到芯片上。其次,定制的DMA引擎旨在减少大量内存事务对性能的影响。最后,该架构可扩展到多个FPGA,以通过粗粒度并行性获得额外的性能。通过使用嵌入式MPI框架,可以最小化实现这些更改所需的设计工作。该架构在运行速度为100 Mhz的伯克利仿真引擎3平台上进行了测试,在8192x8192网络上实现了12.563 GCUPS的速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号