首页> 外文期刊>The Computer journal >Relational Biterm Topic Model: Short-Text Topic Modeling using Word Embeddings
【24h】

Relational Biterm Topic Model: Short-Text Topic Modeling using Word Embeddings

机译:关系双项主题模型:使用词嵌入的短文本主题建模

获取原文
获取原文并翻译 | 示例
       

摘要

Short texts, such as Twitter social media posts, have become increasingly popular on the Internet. Inferring topics from massive numbers of short texts is important to many real-world applications. A single short text often contains a few words, making traditional topic models less effective. A recently developed biterm topic model (BTM) effectively models short texts by capturing the rich global word co-occurrence information. However, in the sparse short-text context, many highly related words may never co-occur. BTM may lose many potential coherent and prominent word co-occurrence patterns that cannot be observed in the corpus. To address this problem, we propose a novel relational BTM (R-BTM) model, which links short texts using a similarity list of words computed employing word embeddings. To evaluate the effectiveness of R-BTM, we compare it against the existing short-text topic models on a variety of traditional tasks, including topic quality, clustering and text similarity. Experimental results on real-world datasets indicate that R-BTM outperforms baseline topic models for short texts.
机译:诸如Twitter社交媒体帖子之类的短文本已在Internet上变得越来越流行。从大量的短文本中推断出主题对于许多实际应用很重要。单个短文本通常包含几个单词,这使传统主题模型的效果降低。最近开发的双向术语主题模型(BTM)通过捕获丰富的全局单词共现信息来有效地对短文本进行建模。但是,在稀疏的短文本上下文中,可能永远不会同时出现许多高度相关的单词。 BTM可能会失去许多在语料库中无法观察到的潜在连贯和突出的单词共现模式。为了解决这个问题,我们提出了一种新颖的关系BTM(R-BTM)模型,该模型使用使用词嵌入计算出的词的相似性列表链接短文本。为了评估R-BTM的有效性,我们将其与现有的针对各种传统任务的短文本主题模型进行了比较,包括主题质量,聚类和文本相似性。真实数据集上的实验结果表明,R-BTM优于短文本的基线主题模型。

著录项

  • 来源
    《The Computer journal》 |2019年第3期|359-372|共14页
  • 作者单位

    Jilin Univ, Coll Comp Sci & Technol, Changchun, Jilin, Peoples R China|Jilin Univ, Key Lab Symbol Computat & Knowledge Engn, Minist Educ, Changchun, Jilin, Peoples R China;

    Jilin Univ, Coll Comp Sci & Technol, Changchun, Jilin, Peoples R China|Jilin Univ, Key Lab Symbol Computat & Knowledge Engn, Minist Educ, Changchun, Jilin, Peoples R China;

    Jilin Univ, Coll Comp Sci & Technol, Changchun, Jilin, Peoples R China|Jilin Univ, Key Lab Symbol Computat & Knowledge Engn, Minist Educ, Changchun, Jilin, Peoples R China;

    Northwestern Polytech Univ, Sch Automat, Xian, Shaanxi, Peoples R China;

    Jilin Univ, Key Lab Symbol Computat & Knowledge Engn, Minist Educ, Changchun, Jilin, Peoples R China|Jilin Univ, Coll Math, Changchun, Jilin, Peoples R China;

    Jilin Univ, Coll Comp Sci & Technol, Changchun, Jilin, Peoples R China|Jilin Univ, Key Lab Symbol Computat & Knowledge Engn, Minist Educ, Changchun, Jilin, Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    short text; topic modeling; word embeddings; clustering; text similarity;

    机译:短文本;主题建模;词嵌入;聚类;文本相似度;
  • 入库时间 2022-08-18 04:16:37

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号