首页> 中文期刊> 《计算机工程与设计》 >对数似然相似度算法的MapReduce并行化实现

对数似然相似度算法的MapReduce并行化实现

         

摘要

To improve the ability of CF algorithm in Mahout to deal with massive data,using the cloud computing platform,Ma-pReduce programming model was introduced to compute similarity in parallel.Four submissions of MapReduce were designed to implement the parallelism of loglikelihood similarity algorithm.Considering the characteristics of the algorithm itself,lots of small key-value pairs were merged into big ones by adopting the idea of composite key and the co-occurrence matrix to reduce computational complexity and network bandwidth.The experimental results show that the loglikelihood similarity algorithm based on Hadoop has excellent linear speedup with computing nodes to a certain number and good scalability in terms of big data.%为提高Mahout中协同过滤算法处理大数据的能力,对云计算平台进行研究,提出一种基于MapReduce模型计算相似度的方法。通过设计4个MapReduce任务,实现对数似然相似度算法的并行化;结合算法自身的特点,采用复合键对和同现矩阵的思想将大量小键值对合并为大键值对,以减少中间计算量和通信开销。实验结果表明,和Mahout中的单机版相似度算法相比,基于Hadoop平台的对数似然相似度算法具有很好的加速比和可扩展性,能够提升推荐算法的效率。

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号