首页> 外文期刊>International journal of parallel programming >DCF: A Dataflow-Based Collaborative Filtering Training Algorithm
【24h】

DCF: A Dataflow-Based Collaborative Filtering Training Algorithm

机译:DCF:一种基于数据流的协同过滤训练算法

获取原文
获取原文并翻译 | 示例

摘要

Emerging recommender systems often adopt collaborative filtering techniques to improve the recommending accuracy. Existing collaborative filtering techniques are implemented with either alternating least square algorithm or gradient descent (GD) algorithm. However, both of the two algorithms are not scalable because ALS suffers from high computation complexity and GD suffers from severe synchronization problem and tremendous data movement. To solve the above problems, we proposed a Dataflow-based Collaborative Filtering (DCF) algorithm. More specifically, DCF exploits fine-grain asynchronous feature of dataflow model to minimize synchronization overhead; leverages mini-batch technique to reduce computation and communication complexities; uses dummy edge and multicasting techniques to avoid fine-grain overhead of dependency checking and reduce data movement. By utilizing all the above techniques, DCF is able to significantly improve the performance of collaborative filtering. Our experiment on a cluster with one master node and ten slave nodes show that DCF achieves 23 $$imes $$ × speedup over ALS on Spark and 18 $$imes $$ × speedup over GD on Graphlab in public datasets.
机译:新兴的推荐系统通常采用协作过滤技术来提高推荐准确性。现有的协同过滤技术是通过交替最小二乘算法或梯度下降(GD)算法实现的。但是,这两种算法都无法扩展,因为ALS遭受了很高的计算复杂度,而GD遭受了严重的同步问题和巨大的数据移动。为了解决上述问题,我们提出了一种基于数据流的协同过滤(DCF)算法。更具体地说,DCF利用数据流模型的细粒度异步功能来最小化同步开销。利用小批量技术降低计算和通信复杂性;使用伪边缘和多播技术来避免依赖检查的细粒度开销并减少数据移动。通过利用以上所有技术,DCF能够显着提高协作过滤的性能。我们在具有一个主节点和十个从属节点的群集上进行的实验表明,在公共数据集中,DCF在Graphlab上比ALS在ALS上提高了23 $$ 倍$$×GD在Graphlab上实现了18 $$ 倍$$×GD的加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号