首页> 外文会议>Information Retrieval Technology >Smoothing LDA Model for Text Categorization
【24h】

Smoothing LDA Model for Text Categorization

机译:用于文本分类的平滑LDA模型

获取原文

摘要

Latent Dirichlet Allocation (LDA) is a document level language model. In general, LDA employ the symmetry Dirichlet distribution as prior of the topic-words' distributions to implement model smoothing. In this paper, we propose a data-driven smoothing strategy in which probability mass is allocated from smoothing-data to latent variables by the intrinsic inference procedure of LDA. In such a way, the arbitrariness of choosing latent variables' priors for the multi-level graphical model is overcome. Following this data-driven strategy, two concrete methods, Laplacian smoothing and Jelinek-Mercer smoothing, are employed to LDA model. Evaluations on different text categorization collections show data-driven smoothing can significantly improve the performance in balanced and unbalanced corpora.
机译:潜在狄利克雷分配(LDA)是文档级语言模型。通常,LDA在主题词分布之前采用对称Dirichlet分布来实现模型平滑。在本文中,我们提出了一种数据驱动的平滑策略,其中通过LDA的内在推理过程将概率质量从平滑数据分配给潜在变量。这样,克服了为多层图形模型选择潜变量先验的任意性。按照这种数据驱动的策略,将两种具体的方法(拉普拉斯平滑法和Jelinek-Mercer平滑法)用于LDA模型。对不同文本分类集合的评估显示,数据驱动的平滑处理可以显着提高平衡语料库和不平衡语料库的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号