P~2LSA and P~2LSA+: Two Paralleled Probabilistic Latent Semantic Analysis Algorithms Based on the MapReduce Model

机译：P〜2LSA和P〜2LSA +：基于MapReduce模型的两种并行概率潜在语义分析算法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Two novel paralleled Probabilistic Latent Semantic Analysis (PLSA) algorithms based on the MapReduce model are proposed, which are P~2LSA and P~2LSA+, respectively. When dealing with a large-scale data set, P~2LSA and P~2LSA+ can improve the computing speed with the Hadoop platform. The Expectation-Maximization (EM) algorithm is often used in the traditional PLSA method to estimate two hidden parameter vectors, while the parallel PLSA is to implement the EM algorithm in parallel. The EM algorithm includes two steps: E-step and M-step. In P~2LSA, the Map function is adopted to perform the E-step and the Reduce function is adopted to perform the M-step. However, all the intermediate results computed in the E-step need to be sent to the M-step. Transferring a large amount of data between the E-step and the M-step increases the burden on the network and the overall running time. Different from P~2LSA, the Map function in P~2LSA+ performs the E-step and M-step simultaneously. Therefore, the data transferred between the E-step and M-step is reduced and the performance is improved. Experiments are conducted to evaluate the performances of P~2LSA and P~2LSA+. The data set includes 20000 users and 10927 goods. The speedup curves show that the overall running time decrease as the number of computing nodes increases.Also, the overall running time demonstrates that P~2LSA+ is about 3 times faster than P~2LSA.

机译：提出了两种基于MapReduce模型的新型并行概率潜在语义分析算法（PLSA），分别为P〜2LSA和P〜2LSA +。当处理大规模数据集时，P〜2LSA和P〜2LSA +可以通过Hadoop平台提高计算速度。在传统的PLSA方法中，经常使用期望最大算法（EM）来估计两个隐藏参数向量，而并行PLSA则是并行实现EM算法。 EM算法包括两个步骤：E步骤和M步骤。在P〜2LSA中，采用Map函数执行E步，采用Reduce函数执行M步。但是，在E步骤中计算出的所有中间结果都需要发送到M步骤。在E步骤和M步骤之间传输大量数据会增加网络负担和整个运行时间。与P〜2LSA不同，P〜2LSA +中的Map功能同时执行E步和M步。因此，减少了在E步骤和M步骤之间传送的数据，并且提高了性能。实验评估了P〜2LSA和P〜2LSA +的性能。数据集包括20000个用户和10927个商品。加速曲线表明，随着计算节点数量的增加，总体运行时间减少。此外，总体运行时间表明P〜2LSA +比P〜2LSA快3倍。

著录项

来源
《Intelligent data engineering and automated learning-IDEAL 2011》|2011年|p.385-393|共9页
会议地点 Norwich(GB);Norwich(GB)
作者
Yan Jin; Yang Gao; Yinghuan Shi; Lin Shang; Ruili Wang; Yubin Yang;
展开▼
作者单位

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China;

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China;

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China;

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China;

School of Engineering and Advanced Technology Massey University Palmerston North, New Zealand;

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算机网络;
关键词
paralleled PLSA; PLSA; mapreduce;

机译：并行PLSA; PLSA； mapreduce;

相似文献

外文文献
中文文献
专利

1. Collaborative recommendation algorithm based on probabilistic matrix factorization in probabilistic latent semantic analysis [J] . Huang Li, Tan Wenan, Sun Yong Multimedia Tools and Applications . 2019,第7期

机译：概率潜在语义分析中基于概率矩阵分解的协同推荐算法
2. Collaborative recommendation algorithm based on probabilistic matrix factorization in probabilistic latent semantic analysis [J] . Huang Li, Tan Wenan, Sun Yong Multimedia Tools and Applications . 2019,第7期

机译：基于概率矩阵分解在概率潜在语义分析中的协作推荐算法
3. Design and evaluation of a parallel document clustering algorithm based on hierarchical latent semantic analysis [J] . Karthick Seshadri, K. Viswanathan Iyer, Mercy Shalinie S Concurrency, practice and experience . 2019,第13期

机译：基于层次化潜在语义分析的并行文档聚类算法设计与评估
4. P~2LSA and P~2LSA+: Two Paralleled Probabilistic Latent Semantic Analysis Algorithms Based on the MapReduce Model [C] . Yan Jin, Yang Gao, Yinghuan Shi, International Conference on Intelligent Data Engineering and Automated Learning . 2011

机译：P〜2LSA和P〜2LSA +：基于MapReduce模型的两个并联概率潜伏语义分析算法
5. Probabilistic discriminant analysis for functional MRI and model-based tuning algorithm for MRI coil array. [D] . Liu, Liang. 2010

机译：功能性MRI的概率判别分析和MRI线圈阵列的基于模型的调整算法。
6. Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets [O] . D. D. Shrimankar, S. R. Sathe 2016

机译：大型生物数据集基于新图块的并行编程模型对SMP节点和工作站集群的并行算法进行分析
7. PPLSA: Parallel Probabilistic Latent Semantic Analysis Based on MapReduce [O] . Li, Ning, Zhuang, Fuzhen, He, Qing, 2012

机译：PPLSA：基于MapReduce的并行概率潜在语义分析

P~2LSA and P~2LSA+: Two Paralleled Probabilistic Latent Semantic Analysis Algorithms Based on the MapReduce Model

摘要

著录项

相似文献

相关主题

期刊订阅