Algorithmic Acceleration of Parallel ALS for Collaborative Filtering: Speeding up Distributed Big Data Recommendation in Spark

机译：用于协作过滤的并行ALS的算法加速：加速Spark中的分布式大数据推荐

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Collaborative filtering algorithms are important building blocks in many practical recommendation systems. For example, many large-scale data processing environments include collaborative filtering models for which the Alternating Least Squares (ALS) algorithm is used to compute latent factor matrix decompositions. In this paper, we propose an approach to accelerate the convergence of parallel ALS-based optimization methods for collaborative filtering using a nonlinear conjugate gradient (NCG) wrapper around the ALS iterations. We also provide a parallel implementation of the accelerated ALS-NCG algorithm in the Apache Spark distributed data processing environment, and an efficient line search technique as part of the ALS-NCG implementation that requires only one pass over the data on distributed datasets. In serial numerical experiments on a linux workstation and parallel numerical experiments on a 16 node cluster with 256 computing cores, we demonstrate that the combined ALS-NCG method requires many fewer iterations and less time than standalone ALS to reach movie rankings with high accuracy on the MovieLens 20M dataset. In parallel, ALS-NCG can achieve an acceleration factor of 4 or greater in clock time when an accurate solution is desired; furthermore, the acceleration factor increases as greater numerical precision is required in the solution. Furthermore, the NCG acceleration mechanism is efficient in parallel and scales linearly with problem size on synthetic datasets with up to nearly 1 billion ratings. The acceleration mechanism is general and may also be applicable to other optimization methods for collaborative filtering.

机译：协作过滤算法是许多实际推荐系统中的重要构建块。例如，许多大型数据处理环境都包含协作过滤模型，针对该模型，使用交替最小二乘（ALS）算法来计算潜在因子矩阵分解。在本文中，我们提出了一种在ALS迭代周围使用非线性共轭梯度（NCG）包装器来加速基于并行ALS的协同过滤优化方法的收敛的方法。我们还提供了Apache Spark分布式数据处理环境中加速ALS-NCG算法的并行实现，以及作为ALS-NCG实现的一部分的高效行搜索技术，该技术只需要对分布式数据集上的数据进行一次传递。在linux工作站上的串行数值实验和在具有256个计算核心的16节点群集上的并行数值实验中，我们证明了与独立ALS相比，组合ALS-NCG方法需要更少的迭代次数和更少的时间来达到电影排名的高精度MovieLens 20M数据集。同时，当需要精确的解决方案时，ALS-NCG的时钟时间加速因子可以达到4或更大。此外，随着解决方案中要求更高的数值精度，加速度因子也会增加。此外，NCG加速机制在并行处理方面非常有效，并且可以在问题数据集上线性扩展，并具有多达近10亿个额定值的综合数据集。加速机制是通用的，也可能适用于协作过滤的其他优化方法。

著录项

来源
《IEEE International Conference on Parallel and Distributed Systems》|2015年|682-691|共10页
会议地点
作者
Manda Winlaw; Michael B. Hynes; Anthony Caterini; Hans De Sterck;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Apache Spark; Big Data; Recommendation systems; collaborative filtering; matrix factorization; parallel optimization algorithms; scalable methods;

机译：Apache Spark;大数据;推荐系统;协作过滤;矩阵分解;并行优化算法;可扩展方法;

相似文献

外文文献
中文文献
专利

1. Research on parallelisation of collaborative filtering recommendation algorithm based on Spark [J] . Yongli Yang, Zhenhu Ning, Yongquan Cai, International journal of wireless and mobile computing . 2018,第4期

机译：基于Spark的协同过滤推荐算法并行化研究
2. Parallel naieve Bayes regression model-based collaborative filtering recommendation algorithm and its realisation on Hadoop for big data [J] . Shiqi Wen, Cheng Wang, Haibo Li, International journal of infomation technology and management . 2019,第2a3期

机译：并行明智贝叶斯回归模型的协同过滤推荐算法及其对大数据的Hadoop实现
3. Improvement of recommendation algorithm based on Collaborative Deep Learning and its Parallelization on Spark [J] . Fan Yang, Huaqiong Wang, Jianjing Fu Journal of Parallel and Distributed Computing . 2021,第Feba期

机译：基于协同深度学习的推荐算法及其对火花的平行化改进
4. Algorithmic Acceleration of Parallel ALS for Collaborative Filtering: Speeding up Distributed Big Data Recommendation in Spark [C] . Manda Winlaw, Michael B. Hynes, Anthony Caterini, IEEE International Conference on Parallel and Distributed Systems . 2015

机译：Parallation ALS进行协作滤波的算法加速度：加快Spark中的分布式大数据建议
5. A Comparative Study of Collaborative Filtering Recommendation Systems Using Algorithms to Impute Large Sparse Matrices. [D] . Lindo, Steven Christopher. 2016

机译：使用算法插补大稀疏矩阵的协同过滤推荐系统的比较研究。
6. Multi-GPU Multi-Node Algorithms for Acceleration of Image Reconstruction in 3D Electrical Capacitance Tomography in Heterogeneous Distributed System [O] . Michał Majchrowicz, Paweł Kapusta, Lidia Jackowska-Strumiłło, 2020

机译：异构分布系统中3D电容层析成像中加速图像重建的多GPU多节点算法
7. Algorithmic Acceleration of Parallel ALS for Collaborative Filtering: Speeding up Distributed Big Data Recommendation in Spark [O] . Winlaw, Manda, Hynes, Michael B., Caterini, Anthony, 2016

机译：用于协同过滤的并行aLs的算法加速：加速spark中的分布式大数据推荐

Algorithmic Acceleration of Parallel ALS for Collaborative Filtering: Speeding up Distributed Big Data Recommendation in Spark

摘要

著录项

相似文献

相关主题

期刊订阅