【24h】

Human Perception of Enriched Topic Models

机译:富集主题模型的人类感知

获取原文

摘要

Topic modeling algorithms, such as LDA, find topics, hidden structures, in document corpora in an unsupervised manner. Traditionally, applications of topic modeling over textual data use the bag-of-words model, i.e. only consider words in the documents. In our previous work we developed a framework for mining enriched topic models. We proposed a bag-of-features approach, where a document consists not only of words but also of linked named entities and their related information, such as types or categories. In this work we focused on the feature engineering and selection aspects of enriched topic modeling and evaluated the results based on two measures for assessing the understandability of estimated topics for humans: model precision and topic log odds. In our 10-model experimental setup with 7 pure resource-, 2 hybrid words/resource- and one word-based model, the traditional bag-of-words models were outperformed by 5 pure resource-based models in both measures. These results show that incorporating background knowledge into topic models makes them more understandable for humans.
机译:主题建模算法,如LDA,以无人监督的方式在Document Corpora中找到主题,隐藏结构。传统上,主题建模在文本数据上的应用使用了文字袋模型,即仅考虑文档中的单词。在我们以前的工作中,我们开发了一个挖掘丰富主题模型的框架。我们提出了一个特征的方法,其中文档不仅包括单词,而且包括链接的命名实体及其相关信息,例如类型或类别。在这项工作中,我们专注于丰富的主题建模的特征工程和选择方面,并根据评估人类估计主题的可理解性的两项措施来评估结果:模型精度和主题日志赔率。在我们的10型模型实验设置中,具有7个纯资源,2个混合词/资源和一个基于单词的模型,传统的文字袋式模型在两种措施中由5种纯资源的模型表现优于5个。这些结果表明,将背景知识纳入主题模型使它们对人类更加理解。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号