首页> 外文会议>IEEE International Conference on Parallel and Distributed Systems >Algorithmic Acceleration of Parallel ALS for Collaborative Filtering: Speeding up Distributed Big Data Recommendation in Spark
【24h】

Algorithmic Acceleration of Parallel ALS for Collaborative Filtering: Speeding up Distributed Big Data Recommendation in Spark

机译:用于协作过滤的并行ALS的算法加速:加速Spark中的分布式大数据推荐

获取原文

摘要

Collaborative filtering algorithms are important building blocks in many practical recommendation systems. For example, many large-scale data processing environments include collaborative filtering models for which the Alternating Least Squares (ALS) algorithm is used to compute latent factor matrix decompositions. In this paper, we propose an approach to accelerate the convergence of parallel ALS-based optimization methods for collaborative filtering using a nonlinear conjugate gradient (NCG) wrapper around the ALS iterations. We also provide a parallel implementation of the accelerated ALS-NCG algorithm in the Apache Spark distributed data processing environment, and an efficient line search technique as part of the ALS-NCG implementation that requires only one pass over the data on distributed datasets. In serial numerical experiments on a linux workstation and parallel numerical experiments on a 16 node cluster with 256 computing cores, we demonstrate that the combined ALS-NCG method requires many fewer iterations and less time than standalone ALS to reach movie rankings with high accuracy on the MovieLens 20M dataset. In parallel, ALS-NCG can achieve an acceleration factor of 4 or greater in clock time when an accurate solution is desired; furthermore, the acceleration factor increases as greater numerical precision is required in the solution. Furthermore, the NCG acceleration mechanism is efficient in parallel and scales linearly with problem size on synthetic datasets with up to nearly 1 billion ratings. The acceleration mechanism is general and may also be applicable to other optimization methods for collaborative filtering.
机译:协作过滤算法是许多实际推荐系统中的重要构建块。例如,许多大型数据处理环境都包含协作过滤模型,针对该模型,使用交替最小二乘(ALS)算法来计算潜在因子矩阵分解。在本文中,我们提出了一种在ALS迭代周围使用非线性共轭梯度(NCG)包装器来加速基于并行ALS的协同过滤优化方法的收敛的方法。我们还提供了Apache Spark分布式数据处理环境中加速ALS-NCG算法的并行实现,以及作为ALS-NCG实现的一部分的高效行搜索技术,该技术只需要对分布式数据集上的数据进行一次传递。在linux工作站上的串行数值实验和在具有256个计算核心的16节点群集上的并行数值实验中,我们证明了与独立ALS相比,组合ALS-NCG方法需要更少的迭代次数和更少的时间来达到电影排名的高精度MovieLens 20M数据集。同时,当需要精确的解决方案时,ALS-NCG的时钟时间加速因子可以达到4或更大。此外,随着解决方案中要求更高的数值精度,加速度因子也会增加。此外,NCG加速机制在并行处理方面非常有效,并且可以在问题数据集上线性扩展,并具有多达近10亿个额定值的综合数据集。加速机制是通用的,也可能适用于协作过滤的其他优化方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号