首页> 外文会议>International Conference on Intelligent Data Engineering and Automated Learning >P~2LSA and P~2LSA+: Two Paralleled Probabilistic Latent Semantic Analysis Algorithms Based on the MapReduce Model
【24h】

P~2LSA and P~2LSA+: Two Paralleled Probabilistic Latent Semantic Analysis Algorithms Based on the MapReduce Model

机译:P〜2LSA和P〜2LSA +:基于MapReduce模型的两个并联概率潜伏语义分析算法

获取原文

摘要

Two novel paralleled Probabilistic Latent Semantic Analysis (PLSA) algorithms based on the MapReduce model are proposed, which are P~2LSA and P~2LSA+, respectively. When dealing with a large-scale data set, P~2LSA and P~2LSA+ can improve the computing speed with the Hadoop platform. The Expectation-Maximization (EM) algorithm is often used in the traditional PLSA method to estimate two hidden parameter vectors, while the parallel PLSA is to implement the EM algorithm in parallel. The EM algorithm includes two steps: E-step and M-step. In P~2LSA, the Map function is adopted to perform the E-step and the Reduce function is adopted to perform the M-step. However, all the intermediate results computed in the E-step need to be sent to the M-step. Transferring a large amount of data between the E-step and the M-step increases the burden on the network and the overall running time. Different from P~2LSA, the Map function in P~2LSA+ performs the E-step and M-step simultaneously. Therefore, the data transferred between the E-step and M-step is reduced and the performance is improved. Experiments are conducted to evaluate the performances of P~2LSA and P~2LSA+. The data set includes 20000 users and 10927 goods. The speedup curves show that the overall running time decrease as the number of computing nodes increases. Also, the overall running time demonstrates that P~2LSA+ is about 3 times faster than P~2LSA.
机译:两种新型并联概率潜在语义分析(PLSA)提出了基于MapReduce算法模型,分别P〜2LSA和P〜2LSA +,。在处理大规模数据集时,P〜2LSA和P〜2LSA +可以通过Hadoop平台提高计算速度。期望最大化(EM)算法通常用于传统的PLSA方法来估计两个隐藏参数向量,而并行PLSA是并行地实现EM算法。 EM算法包括两个步骤:E-Step和M步。在P〜2LSA中,采用MAP功能执行E-Step,采用降低功能执行M步。但是,在电子步骤中计算的所有中间结果都需要发送到M步骤。在电子步骤之间传输大量数据,M-Step增加了网络上的负担和整个运行时间。不同于P〜2LSA,P〜2LSA +中的MAP功能同时执行E-Step和M步。因此,减少了在电子步骤和M步之间传输的数据,并且性能得到改善。进行实验以评估P〜2LSA和P〜2LSA +的性能。数据集包括20000用户和10927商品。随着计算节点的数量增加,加速曲线表明总运行时间减少。此外,总运行时间表明p〜2lsa +比p〜2lsa快3倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号