Smoothing LDA Model for Text Categorization

机译：用于文本分类的平滑LDA模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Latent Dirichlet Allocation (LDA) is a document level language model. In general, LDA employ the symmetry Dirichlet distribution as prior of the topic-words' distributions to implement model smoothing. In this paper, we propose a data-driven smoothing strategy in which probability mass is allocated from smoothing-data to latent variables by the intrinsic inference procedure of LDA. In such a way, the arbitrariness of choosing latent variables' priors for the multi-level graphical model is overcome. Following this data-driven strategy, two concrete methods, Laplacian smoothing and Jelinek-Mercer smoothing, are employed to LDA model. Evaluations on different text categorization collections show data-driven smoothing can significantly improve the performance in balanced and unbalanced corpora.

机译：潜在狄利克雷分配（LDA）是文档级语言模型。通常，LDA在主题词分布之前采用对称Dirichlet分布来实现模型平滑。在本文中，我们提出了一种数据驱动的平滑策略，其中通过LDA的内在推理过程将概率质量从平滑数据分配给潜在变量。这样，克服了为多层图形模型选择潜变量先验的任意性。按照这种数据驱动的策略，将两种具体的方法（拉普拉斯平滑法和Jelinek-Mercer平滑法）用于LDA模型。对不同文本分类集合的评估显示，数据驱动的平滑处理可以显着提高平衡语料库和不平衡语料库的性能。

著录项

来源
《Information Retrieval Technology》|2008年|P.83-94|共12页
会议地点
作者
Wenbo Li; Le Sun; Yuanyong Feng; Dakun Zhang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算机设备安全;
关键词
text categorization; latent dirichlet allocation; smoothing; graphical model;

机译：文本分类;潜在狄利克雷分配;平滑;图形模型;

相似文献

外文文献
中文文献
专利

1. LDA-AdaBoost.MH: Accelerated AdaBoost.MH based on latent Dirichlet allocation for text categorization [J] . Bassam Al-Salemi, Mohd. Juzaiddin Ab Aziz, Shahrul Azman Noah Journal of Information Science . 2015,第1期

机译：LDA-AdaBoost.MH：基于潜在Dirichlet分配进行文本分类的加速AdaBoost.MH
2. Text Document Categorization using Enhanced Sentence Vector Space Model and Bi-Gram Text Representation Model Based on Novel Fusion Techniques [J] . Abdisa Demissie Amensisa New Media and Mass Communication . 2020,第4期

机译：基于新型融合技术的基于增强句子矢量空间模型和双革文本表示模型的文本文档分类
3. A logistic regression-based smoothing method for Chinese text categorization [J] . Show-Jane Yen, Yue-Shi Lee, Jia-Ching Ying, Expert Systems with Application . 2011,第9期

机译：基于逻辑回归的中文文本分类平滑方法
4. Smoothing LDA Model for Text Categorization [C] . Wenbo Li, Le Sun, Yuanyong Feng, Asia Information Retrieval Symposium . 2008

机译：平滑LDA模型文本分类
5. Machine Learning Models for Categorizing Privacy Policy Text [D] . Aryasomayajula, Naga Srinivasa Baradwaj. 2018

机译：用于对隐私政策文本进行分类的机器学习模型
6. Biomedical Text Categorization Based on Ensemble Pruning and Optimized Topic Modelling [O] . Aytuğ Onan 2018

机译：基于集合修剪和优化主题建模的生物医学文本分类
7. An Elaboration of Text Categorization and Automatic Text Classification Through Mathematical and Graphical Modelling [O] . Ahmed Faraz 2015

机译：通过数学和图形建模制定文本分类和自动文本分类

Smoothing LDA Model for Text Categorization

摘要

著录项

相似文献

相关主题

期刊订阅