Comparing LDA with pLSI as a Dimensionality Reduction Method in Document Clustering

机译：将LDA与PLSI进行比较作为文档聚类中的维度减少方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we compare latent Dirichlet allocation (LDA) with probabilistic latent semantic indexing (pLSI) as a dimensionality reduction method and investigate their effectiveness in document clustering by using real-world document sets. For clustering of documents, we use a method based on multinomial mixture, which is known as an efficient framework for text mining. Clustering results are evaluated by F-measure, i.e., harmonic mean of precision and recall. We use Japanese and Korean Web articles for evaluation and regard the category assigned to each Web article as the ground truth for the evaluation of clustering results. Our experiment shows that the dimensionality reduction via LDA and pLSI results in document clusters of almost the same quality as those obtained by using original feature vectors. Therefore, we can reduce the vector dimension without degrading cluster quality. Further, both LDA and pLSI are more effective than random projection, the baseline method in our experiment. However, our experiment provides no meaningful difference between LDA and pLSI. This result suggests that LDA does not replace pLSI at least for dimensionality reduction in document clustering.

机译：在本文中，我们将潜在的Dirichlet分配（LDA）与概率潜入语义索引（PLSI）进行比较，作为维度减少方法，并通过使用现实世界文档集调查文档聚类中的有效性。对于文档的聚类，我们使用基于多项式混合物的方法，该方法被称为文本挖掘的有效框架。聚类结果由F-Measure，即谐波均衡的谐波均值评估。我们使用日语和韩国网络文章进行评估，并将分配给每个Web文章分配的类别作为评估聚类结果的基础事实。我们的实验表明，通过LDA和PLSI的维度降低导致几乎与使用原始特征向量获得的文档簇。因此，我们可以减少矢量维度而不会降低群集质量。此外，LDA和PLSI都比随机投影更有效，基线方法在我们的实验中。但是，我们的实验在LDA和PLSI之间没有提供有意义的差异。该结果表明，LDA至少替换PLSI至少用于文档聚类的维度减少。

著录项

来源
《International Conference on Large-Scale Knowledge Resources》|2008年||共14页
会议地点
作者
Tomonari Masada; Senya Kiyasu; Sueharu Miyahara;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.13;
关键词

相似文献

外文文献
中文文献
专利

1. Document clustering method using dimension reduction and support vector clustering to overcome sparseness [J] . Sunghaejun, Sang-Sung Park, Dong-Sikjang Expert Systems with Application . 2014,第7期

机译：利用降维和支持向量聚类克服稀疏性的文档聚类方法
2. Document clustering method using dimension reduction and support vector clustering to overcome sparseness [J] . F. Can Computing reviews . 2014,第12期

机译：利用降维和支持向量聚类克服稀疏性的文档聚类方法
3. Dimensionality Reduction Techniques for Visualizing Morphometric Data: Comparing Principal Component Analysis to Nonlinear Methods [J] . Du Trina Y. Evolutionary biology . 2019,第1期

机译：用于可视化形态测量数据的维度减少技术：将主成分分析与非线性方法进行比较
4. Comparing LDA with pLSI as a Dimensionality Reduction Method in Document Clustering [C] . Tomonari Masada, Senya Kiyasu, Sueharu Miyahara Large-Scale Knowledge Resources: Construction and Application . 2008

机译：LDA与pLSI的比较作为降维方法的文档聚类
5. Multiple alternative clusterings and dimensionality reduction. [D] . Niu, Donglin. 2012

机译：多个替代聚类和降维。
6. A New Method Combining LDA and PLS for Dimension Reduction [O] . Liang Tang, Silong Peng, Yiming Bi, -1

机译：结合LDA和PLS的降维新方法。
7. Comparing LDA with pLSI as a Dimensionality Reduction Method in Document Clustering [O] . Masada, Tomonari, Kiyasu, Senya, Miyahara, Sueharu 2008

机译：比较LDa和pLsI作为文档聚类中的维数降低方法

Comparing LDA with pLSI as a Dimensionality Reduction Method in Document Clustering

摘要

著录项

相似文献

相关主题

期刊订阅