首页> 外文OA文献 >Parallelization of the Alternating-Least-Squares Algorithm With Weighted Regularization for Efficient GPU Execution in Recommender Systems

【2h】

Parallelization of the Alternating-Least-Squares Algorithm With Weighted Regularization for Efficient GPU Execution in Recommender Systems

机译：具有加权正则化的交替最小二乘算法的并行化在记录系统中的高效GpU执行

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Collaborative filtering recommender systems have become essential to many Internet services, providing, for instance, book recommendations at Amazon's online e-commerce service, music recommendation in Spotify and movie recommendation in Netflix.Matrix factorization and Restricted Boltzmann Machines (RBMs) are two popular methods for implementing recommender systems, both providing superior accuracy over common neighborhood models. Both methods also shift much of the computation from the prediction phase to the model training phase, which enables fast predictions once the model has been trained.This thesis suggests a novel approach for performing matrix factorization using the Alternating-Least-Squares with Weighted-Lambda-Regularization (ALS-WR) algorithm on CUDA (ALS-CUDA). The algorithm is implemented and evaluated in the context of recommender systems by comparing it to other commonly used approaches. These include an RBM and a stochastic gradient descent (SGD) approach. Our evaluation shows that significant speedups can be achieved by using CUDA and GPUs for training recommender systems. The ALS-CUDA algorithm implemented in this thesis provided speedup factors of up to 175.4 over the sequential CPU ALS implementation and scales linearly with the number of CUDA threads assigned to it until the GPUs shared memory has been saturated. Comparing the performance of the ALS-CUDA algorithm to CUDA implementations of the SGD and the RBM algorithms shows that the ALS-CUDA algorithm outperformed the RBM. For a sparse dataset, results indicate that the ALS-CUDA algorithm performs slightly worse than the SGD implementation, while for a dense dataset, ALS-CUDA outperforms the SGD. However, generally the advantage of the ALS-CUDA algorithm does not necessarily lie in its speed, but also in the fact that it requires fewer parameters than the SGD. It therefore represents a viable option when some speed can be traded off for algorithmic stability, or when the dataset is dense.

机译：协同过滤推荐器系统已成为许多Internet服务必不可少的功能，例如在亚马逊的在线电子商务服务中提供书籍推荐，在Spotify中提供音乐推荐以及在Netflix中提供电影推荐。矩阵分解和受限玻尔兹曼机（RBM）是两种流行的方法用于实施推荐系统，两者均提供比普通邻域模型更高的准确性。两种方法都将大部分计算从预测阶段转移到模型训练阶段，从而在模型训练后即可进行快速预测。本文提出了一种使用加权最小二乘与加权最小二乘执行矩阵分解的新方法。 CUDA上的正则化（ALS-WR）算法（ALS-CUDA）。通过将推荐算法与其他常用方法进行比较，可以在推荐系统中实施和评估该算法。其中包括RBM和随机梯度下降（SGD）方法。我们的评估表明，通过使用CUDA和GPU来训练推荐系统，可以显着提高速度。本文中实现的ALS-CUDA算法在顺序CPU ALS实现上提供了高达175.4的加速因子，并且随着分配给它的CUDA线程数线性扩展，直到GPU共享内存饱和为止。将ALS-CUDA算法的性能与SGD和RBM算法的CUDA实现进行比较，结果表明ALS-CUDA算法的性能优于RBM。对于稀疏的数据集，结果表明ALS-CUDA算法的性能略低于SGD实施，而对于密集的数据集，ALS-CUDA的性能优于SGD。但是，通常ALS-CUDA算法的优势并不一定在于它的速度，还在于它需要比SGD更少的参数。因此，当可以权衡某些速度以获得算法稳定性时，或者当数据集密集时，它代表了一个可行的选择。

著录项

作者
Kampffmeyer Michael Christian;
展开▼
作者单位

展开▼
年度 2015
总页数
原文格式 PDF
正文语种 eng
中图分类

相似文献

外文文献
中文文献
专利

1. An efficient manifold regularized sparse non-negative matrix factorization model for large-scale recommender systems on GPUs [J] . Li Hao, Li Keqin, An Jiyao, Information Sciences: An International Journal . 2019,第期

机译：GPU上大型推荐系统的高效歧管正则稀疏非负矩阵分解模型
2. Effective Parallelization of a High-Order Graph Matching Algorithm for GPU Execution [J] . Lee Chulhee, Lee Hyuk-Jae IEEE Transactions on Circuits and Systems for Video Technology . 2019,第2期

机译：用于GPU执行的高阶图匹配算法的有效并行化
3. Simultaneous CPU-GPU Execution of Data Parallel Algorithmic Skeletons [J] . Fabian Wrede, Steffen Ernsting International journal of parallel programming . 2018,第1期

机译：数据并行算法框架的同时CPU-GPU执行
4. Designing Efficient Parallel Prefix Sum Algorithms for GPUs [C] . Capannini Gabriele 11th IEEE International Conference on Computer and Information Technology . 2011

机译：为GPU设计高效的并行前缀和算法
5. Efficient time-energy execution of data-parallel applications on heterogeneous systems with GPU [D] . Loghin, Dumitrel. 2017

机译：使用GPU在异构系统上高效执行数据并行应用程序的时间能量
6. Application Performance Analysis and Efficient Execution on Systems with multi-core CPUs GPUs and MICs: A Case Study with Microscopy Image Analysis [O] . George Teodoro, Tahsin Kurc, Guilherme Andrade, -1

机译：具有多核CPUGPU和MIC的系统上的应用程序性能分析和高效执行：以显微镜图像分析为例
7. EFFICIENT TIME-ENERGY EXECUTION OF DATA-PARALLEL APPLICATIONS ON HETEROGENEOUS SYSTEMS WITH GPU [O] . DUMITREL LOGHIN 2017

机译：具有GPU的异构系统上数据并行应用程序的高效时间能源执行

Parallelization of the Alternating-Least-Squares Algorithm With Weighted Regularization for Efficient GPU Execution in Recommender Systems

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅