WooKong: A Ubiquitous Accelerator for Recommendation Algorithms With Custom Instruction Sets on FPGA

Wang Chao; Gong Lei; Ma Xiang; Li Xi; Zhou Xuehai

首页> 外文期刊>IEEE Transactions on Computers >WooKong: A Ubiquitous Accelerator for Recommendation Algorithms With Custom Instruction Sets on FPGA

【24h】

WooKong: A Ubiquitous Accelerator for Recommendation Algorithms With Custom Instruction Sets on FPGA

机译：Wookong：用于FPGA上具有自定义指令集的推荐算法的无处不在的加速器

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Recommendation algorithms, such as Neighborhood-based Collaborative- Filtering (CF), have been widely applied in various emerging machine learning applications. However, under the circumstance of the explosive big data, it poses significant challenges to CF recommendation algorithms as it is becoming quite time and energy-consuming. It has to be optimized and accelerated by powerful engines to process on large data scale. To solve these problems, in this article, we propose WooKong, a ubiquitous accelerator architecture for the collaborative-filtering recommendation on FPGA. It is able to accommodate three types of CF recommendation algorithms, including User-based CF, Item-based CF, and SlopeOne recommendations algorithms, with five different similarity analysis metrics including Jaccard, Cosine, CosineIR, euclidean, and Pearson. To maintain flexibility for these different CF algorithms and metrics, we adopt custom instruction sets to manipulate the learning and prediction accelerators. We implement a hardware prototype on a real Xilinx Zynq FPGA development board. Experimental results show that the proposed learning and prediction accelerators can achieve 8.0X speedup and 1.7X speedup compared with an Intel i7 processor respectively. The accelerator has the energy benefits of up to 137.4X compared with an NVIDIA Tesla K40C GPU, with the affordable hardware cost.

机译：建议算法，如基于邻域的协作滤波（CF），已广泛应用于各种新兴机器学习应用。但是，在爆炸性大数据的情况下，它对CF推荐算法构成了重大挑战，因为它变得非常适当和消耗。必须通过强大的发动机进行优化和加速，以加工大数据量表。为了解决这些问题，在本文中，我们提出了Wookong，这是一个无处不在的加速器架构，用于FPGA上的协作过滤推荐。它能够容纳三种类型的CF推荐算法，包括基于用户的CF，基于项目的CF和Slopeone建议算法，其中包括Jaccard，Coole，Cosineir，Euclidean和Pearson等五种不同的相似性分析指标。为了保持这些不同的CF算法和度量的灵活性，我们采用自定义指令集来操纵学习和预测加速器。我们在真正的Xilinx Zynq FPGA开发板上实现了硬件原型。实验结果表明，与英特尔I7处理器相比，所提出的学习和预测加速器可以实现8.0倍的加速和1.7倍的加速。与NVIDIA Tesla K40C GPU相比，加速器的能量效率高达137.4倍，具有实惠的硬件成本。

著录项

来源
《IEEE Transactions on Computers》 |2020年第7期|1071-1082|共12页
作者
Wang Chao; Gong Lei; Ma Xiang; Li Xi; Zhou Xuehai;
展开▼
作者单位

Univ Sci & Technol China Sch Comp Sci Hefei 230027 Anhui Peoples R China;

Univ Sci & Technol China Sch Comp Sci Hefei 230027 Anhui Peoples R China;

Univ Sci & Technol China Sch Comp Sci Hefei 230027 Anhui Peoples R China;

Univ Sci & Technol China Sch Comp Sci Hefei 230027 Anhui Peoples R China;

Univ Sci & Technol China Sch Comp Sci Hefei 230027 Anhui Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Prediction algorithms; Field programmable gate arrays; Mathematical model; Hardware; Machine learning algorithms; Acceleration; Measurement; Accelerator; recommendation algorithms; machine learning; domain-specific architecture; FPGA;

机译：预测算法;现场可编程门阵列;数学模型;硬件;机器学习算法;加速;测量;加速器;推荐算法;机器学习;机器学习;FPGA;

相似文献

外文文献
中文文献
专利

1. An FPGA Based Accelerator for Clustering Algorithms With Custom Instructions [J] . Wang Chao, Gong Lei, Jia Fahui, IEEE Transactions on Computers . 2021,第5期

机译：具有自定义指令的用于聚类算法的FPGA加速器
2. Suitability of recent hardware accelerators (DSPs, FPGAs, and GPUs) for computer vision and image processing algorithms [J] . HajiRassouliha Amir, Taberner Andrew J., Nash Martyn P., Signal Processing. Image Communication: A Publication of the the European Association for Signal Processing . 2018,第期

机译：适用于电脑视觉和图像处理算法的最近硬件加速器（DSP，FPGA和GPU）的适用性
3. Code mapping algorithm for custom instructions on reconfigurable instruction set processors [J] . Huizhen Zhang, Yonghong Chen International journal of electronics . 2015,第1a3期

机译：可重配置指令集处理器上的自定义指令的代码映射算法
4. Custom Instruction Set NIOS-based OFDM Processor for FPGAs [C] . Uwe Meyer-Baese, Divya Sunkara, Encarnacion Castillo, Wireless Sensing and Processing . 2006

机译：基于FPGA的基于定制指令集NIOS的OFDM处理器
5. A Hybrid Partially Reconfigurable Overlay Supporting Just-In-Time Assembly of Custom Accelerators on FPGAs. [D] . Aklah, Zeyad Tariq. 2017

机译：混合的部分可重新配置的叠加层，可在FPGA上即时组装定制加速器。
6. FPGA accelerator for protein secondary structure prediction based on the GOR algorithm [O] . Fei Xia, Yong Dou, Guoqing Lei, 2011

机译：基于GOR算法的蛋白质二级结构预测的FPGA加速器
7. FPGA-aware custom instructions for reconfigurable instruction set processors [O] . Siew Kei Lam -1

机译：FPGA感知可重新配置指令集处理器的自定义说明

WooKong: A Ubiquitous Accelerator for Recommendation Algorithms With Custom Instruction Sets on FPGA

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅