首页> 外文期刊>Pattern recognition letters >Topic representation: Finding more representative words in topic models
【24h】

Topic representation: Finding more representative words in topic models

机译:主题表示:在主题模型中查找更多具有代表性的单词

获取原文
获取原文并翻译 | 示例
           

摘要

The top word list, i.e., the top-M words with highest marginal probabilities in a given topic, is the standard topic representation in topic models. Most of recent automatical topic labeling algorithms and popular topic quality metrics are based on it. However, we find, empirically, words in this type of top word list are not always representative. The objective of this paper is to find more representative top word lists for topics. To achieve this, we rerank the words in a given topic by further considering marginal probabilities on words over every other topic. The reranking list of top-M words is used to be a novel topic representation for topic models. We investigate three reranking methodologies, using (1) standard deviation weight, (2) standard deviation weight with topic size and (3) Chi Square chi(2) statistic selection. Experimental results on real-world collections indicate that our representations can extract more representative words for topics, agreeing with human judgements. (C) 2019 Elsevier B.V. All rights reserved.
机译:最高单词列表,即给定主题中具有最高边际概率的前M个单词,是主题模型中的标准主题表示形式。最近的大多数自动主题标记算法和流行的主题质量指标都基于此。但是,从经验上我们发现,这类头条单词列表中的单词并不总是具有代表性。本文的目的是为主题找到更多具有代表性的热门单词列表。为了实现这一目标,我们通过进一步考虑每个主题上的单词的边际概率来对给定主题中的单词重新排序。前M个单词的重新排序列表用作主题模型的新颖主题表示形式。我们研究了三种重新排序方法,使用(1)标准偏差权重,(2)具有主题大小的标准偏差权重和(3)卡方卡方(2)统计选择。在现实世界中的实验结果表明,我们的表示法可以提取出更具代表性的主题词,与人类的判断相符。 (C)2019 Elsevier B.V.保留所有权利。

著录项

  • 来源
    《Pattern recognition letters》 |2019年第5期|53-60|共8页
  • 作者单位

    Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Jilin, Peoples R China;

    Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Jilin, Peoples R China;

    Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Jilin, Peoples R China;

    Jilin Univ, Publ Comp Educ & Res Ctr, Changchun 130012, Jilin, Peoples R China;

    Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Jilin, Peoples R China;

    Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Jilin, Peoples R China|Chinese Acad Sci, Changchun Inst Opt Fine Mech & Phys, Changchun 130012, Jilin, Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Topic modeling; Topic representation; Topical word representation; Reranking methodology;

    机译:主题建模;主题表示;主题词表示;排序方法;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号