首页> 外文会议>ISKE 2013;International Conference on Intelligent Systems and Knowledge Engineering >Release 'Bag-of-Words' Assumption of Latent Dirichlet Allocation
【24h】

Release 'Bag-of-Words' Assumption of Latent Dirichlet Allocation

机译:释放“袋子的袋子”潜在Dirichlet分配的假设

获取原文

摘要

Based on vector-based representation, topic models, like latent Dirichlet allocation (LDA), are constructed for documents with 'bag-of-words' assumption. They can discover the distribution of underlying topics in a document and the distribution of keywords in a topic, which have been proved very successful and practical in many scenarios, recently. Comparing vector-based representation of documents, graph-based representation method can preserve more semantics of documents, because not only keywords but also the relations between them in documents are considered. In this paper, a topic model for graph-represented documents (GTM) is proposed. In this model, a Bernoulli distribution is used to model the formation of the edge between two keywords in a document. The experimental results show that GTM outperforms LDA in document classification task using the unveiled topics from these two models to represent documents.
机译:基于基于向量的表示,主题模型如潜在的Dirichlet分配(LDA),用于带有“单词袋”假设的文档。 他们可以发现文档中的基础主题的分发以及在一个主题中的关键字分发,最近在许多情况下被证明非常成功和实用。 比较基于传感器的文档表示,基于图形的表示方法可以保留更多的文档语义,因为不仅关键字,而且考虑其中的文档之间的关系。 在本文中,提出了一个图形文档(GTM)的主题模型。 在该模型中,伯努利分布用于模拟文档中的两个关键字之间的边缘的形成。 实验结果表明,使用来自这两个模型的揭幕主题来表示文档分类任务中的GTM优于LDA来表示文档。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号