首页> 外文期刊>Intelligent data analysis >Incorporating Wikipedia concepts and categories as prior knowledge into topic models
【24h】

Incorporating Wikipedia concepts and categories as prior knowledge into topic models

机译:将Wikipedia概念和类别作为先验知识整合到主题模型中

获取原文
获取原文并翻译 | 示例
           

摘要

Topic models have been widely applied in discovering topics that underly a collection of documents. Incorporating human knowledge can guide conventional topic models to produce topics which are easily interpreted and semantically coherent. Several knowledge-based topic models have been proposed, but these models just leverage lexical knowledge of words that are often not in accordance with topics. To solve the problem, we recognize entity mentions, besides words, in the documents and incorporate entity knowledge from external knowledge bases. In this paper, we study to utilize entity knowledge, concepts and categories in Wikipedia, as prior knowledge into topic models to discover more coherent topics. A novel knowledge-based topic model, WCM-LDA (Wikipedia-Category-concept-Mention Latent Dirichlet Allocation), is proposed, which not only models the relationship between words and topics, but also utilizes concept and category knowledge of entities to model the semantic relation of entities and topics. We compare WCM-LDA with the state-of-the-art knowledge-based topic models, on three datasets. Experimental results show that our approach outperforms the existing baseline methods on all three datasets. Moreover, our model can visualize topics with top words, concepts and categories such that topics are made easily to be interpreted and classified.
机译:主题模型已广泛应用于发现潜在的文档集合的主题。整合人类知识可以指导常规主题模型产生易于解释且语义上连贯的主题。已经提出了几种基于知识的主题模型,但是这些模型只是利用了常常与主题不符的单词的词汇知识。为了解决该问题,我们在文档中识别了单词以外的实体提及,并结合了来自外部知识库的实体知识。在本文中,我们研究利用Wikipedia中的实体知识,概念和类别,作为主题模型中的先验知识,以发现更多一致的主题。提出了一种新颖的基于知识的主题模型,WCM-LDA(维基百科-类别-概念-提及潜在的狄利克雷分配),该模型不仅可以对单词与主题之间的关系进行建模,还可以利用实体​​的概念和类别知识对实体进行建模。实体和主题的语义关系。我们在三个数据集上将WCM-LDA与基于知识的最新主题模型进行了比较。实验结果表明,我们的方法在所有三个数据集上均优于现有的基线方法。而且,我们的模型可以将主题词,概念和类别可视化,从而使主题易于解释和分类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号