首页> 外文期刊>IEEE Transactions on Computers >WooKong: A Ubiquitous Accelerator for Recommendation Algorithms With Custom Instruction Sets on FPGA
【24h】

WooKong: A Ubiquitous Accelerator for Recommendation Algorithms With Custom Instruction Sets on FPGA

机译:Wookong:用于FPGA上具有自定义指令集的推荐算法的无处不在的加速器

获取原文
获取原文并翻译 | 示例

摘要

Recommendation algorithms, such as Neighborhood-based Collaborative- Filtering (CF), have been widely applied in various emerging machine learning applications. However, under the circumstance of the explosive big data, it poses significant challenges to CF recommendation algorithms as it is becoming quite time and energy-consuming. It has to be optimized and accelerated by powerful engines to process on large data scale. To solve these problems, in this article, we propose WooKong, a ubiquitous accelerator architecture for the collaborative-filtering recommendation on FPGA. It is able to accommodate three types of CF recommendation algorithms, including User-based CF, Item-based CF, and SlopeOne recommendations algorithms, with five different similarity analysis metrics including Jaccard, Cosine, CosineIR, euclidean, and Pearson. To maintain flexibility for these different CF algorithms and metrics, we adopt custom instruction sets to manipulate the learning and prediction accelerators. We implement a hardware prototype on a real Xilinx Zynq FPGA development board. Experimental results show that the proposed learning and prediction accelerators can achieve 8.0X speedup and 1.7X speedup compared with an Intel i7 processor respectively. The accelerator has the energy benefits of up to 137.4X compared with an NVIDIA Tesla K40C GPU, with the affordable hardware cost.
机译:建议算法,如基于邻域的协作滤波(CF),已广泛应用于各种新兴机器学习应用。但是,在爆炸性大数据的情况下,它对CF推荐算法构成了重大挑战,因为它变得非常适当和消耗。必须通过强大的发动机进行优化和加速,以加工大数据量表。为了解决这些问题,在本文中,我们提出了Wookong,这是一个无处不在的加速器架构,用于FPGA上的协作过滤推荐。它能够容纳三种类型的CF推荐算法,包括基于用户的CF,基于项目的CF和Slopeone建议算法,其中包括Jaccard,Coole,Cosineir,Euclidean和Pearson等五种不同的相似性分析指标。为了保持这些不同的CF算法和度量的灵活性,我们采用自定义指令集来操纵学习和预测加速器。我们在真正的Xilinx Zynq FPGA开发板上实现了硬件原型。实验结果表明,与英特尔I7处理器相比,所提出的学习和预测加速器可以实现8.0倍的加速和1.7倍的加速。与NVIDIA Tesla K40C GPU相比,加速器的能量效率高达137.4倍,具有实惠的硬件成本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号