【24h】

Human Perception of Enriched Topic Models

机译:人类对丰富主题模型的感知

获取原文

摘要

Topic modeling algorithms, such as LDA, find topics, hidden structures, in document corpora in an unsupervised manner. Traditionally, applications of topic modeling over textual data use the bag-of-words model, i.e. only consider words in the documents. In our previous work we developed a framework for mining enriched topic models. We proposed a bag-of-features approach, where a document consists not only of words but also of linked named entities and their related information, such as types or categories. In this work we focused on the feature engineering and selection aspects of enriched topic modeling and evaluated the results based on two measures for assessing the understandability of estimated topics for humans: model precision and topic log odds. In our 10-model experimental setup with 7 pure resource-, 2 hybrid words/resource- and one word-based model, the traditional bag-of-words models were outperformed by 5 pure resource-based models in both measures. These results show that incorporating background knowledge into topic models makes them more understandable for humans.
机译:主题建模算法(例如LDA)以无监督的方式在文档语料库中找到主题,隐藏结构。传统上,在文本数据上进行主题建模的应用程序使用词袋模型,即仅考虑文档中的词。在我们以前的工作中,我们开发了一个用于挖掘丰富主题模型的框架。我们提出了一种功能袋方法,其中文档不仅由单词组成,而且由链接的命名实体及其相关信息(例如类型或类别)组成。在这项工作中,我们专注于丰富主题建模的特征工程和选择方面,并基于两种方法对结果进行了评估,这些方法用于评估人类对估计主题的可理解性:模型精度和主题对数比。在我们的具有10个模型的实验设置中,这7个纯资源模型,2个混合词/资源模型和一个基于单词的模型,在这两个方面,传统的词袋模型均优于5个基于纯资源的模型。这些结果表明,将背景知识整合到主题模型中可以使人类更容易理解它们。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号