首页> 外文OA文献 >Parallelization of the Alternating-Least-Squares Algorithm With Weighted Regularization for Efficient GPU Execution in Recommender Systems
【2h】

Parallelization of the Alternating-Least-Squares Algorithm With Weighted Regularization for Efficient GPU Execution in Recommender Systems

机译:具有加权正则化的交替最小二乘算法的并行化在记录系统中的高效GpU执行

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Collaborative filtering recommender systems have become essential to many Internet services, providing, for instance, book recommendations at Amazon's online e-commerce service, music recommendation in Spotify and movie recommendation in Netflix.Matrix factorization and Restricted Boltzmann Machines (RBMs) are two popular methods for implementing recommender systems, both providing superior accuracy over common neighborhood models. Both methods also shift much of the computation from the prediction phase to the model training phase, which enables fast predictions once the model has been trained.This thesis suggests a novel approach for performing matrix factorization using the Alternating-Least-Squares with Weighted-Lambda-Regularization (ALS-WR) algorithm on CUDA (ALS-CUDA). The algorithm is implemented and evaluated in the context of recommender systems by comparing it to other commonly used approaches. These include an RBM and a stochastic gradient descent (SGD) approach. Our evaluation shows that significant speedups can be achieved by using CUDA and GPUs for training recommender systems. The ALS-CUDA algorithm implemented in this thesis provided speedup factors of up to 175.4 over the sequential CPU ALS implementation and scales linearly with the number of CUDA threads assigned to it until the GPUs shared memory has been saturated. Comparing the performance of the ALS-CUDA algorithm to CUDA implementations of the SGD and the RBM algorithms shows that the ALS-CUDA algorithm outperformed the RBM. For a sparse dataset, results indicate that the ALS-CUDA algorithm performs slightly worse than the SGD implementation, while for a dense dataset, ALS-CUDA outperforms the SGD. However, generally the advantage of the ALS-CUDA algorithm does not necessarily lie in its speed, but also in the fact that it requires fewer parameters than the SGD. It therefore represents a viable option when some speed can be traded off for algorithmic stability, or when the dataset is dense.
机译:协同过滤推荐器系统已成为许多Internet服务必不可少的功能,例如在亚马逊的在线电子商务服务中提供书籍推荐,在Spotify中提供音乐推荐以及在Netflix中提供电影推荐。矩阵分解和受限玻尔兹曼机(RBM)是两种流行的方法用于实施推荐系统,两者均提供比普通邻域模型更高的准确性。两种方法都将大部分计算从预测阶段转移到模型训练阶段,从而在模型训练后即可进行快速预测。本文提出了一种使用加权最小二乘与加权最小二乘执行矩阵分解的新方法。 CUDA上的正则化(ALS-WR)算法(ALS-CUDA)。通过将推荐算法与其他常用方法进行比较,可以在推荐系统中实施和评估该算法。其中包括RBM和随机梯度下降(SGD)方法。我们的评估表明,通过使用CUDA和GPU来训练推荐系统,可以显着提高速度。本文中实现的ALS-CUDA算法在顺序CPU ALS实现上提供了高达175.4的加速因子,并且随着分配给它的CUDA线程数线性扩展,直到GPU共享内存饱和为止。将ALS-CUDA算法的性能与SGD和RBM算法的CUDA实现进行比较,结果表明ALS-CUDA算法的性能优于RBM。对于稀疏的数据集,结果表明ALS-CUDA算法的性能略低于SGD实施,而对于密集的数据集,ALS-CUDA的性能优于SGD。但是,通常ALS-CUDA算法的优势并不一定在于它的速度,还在于它需要比SGD更少的参数。因此,当可以权衡某些速度以获得算法稳定性时,或者当数据集密集时,它代表了一个可行的选择。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号