...
首页> 外文期刊>Intelligent data analysis >A joint model of extended LDA and IBTM over streaming Chinese short texts
【24h】

A joint model of extended LDA and IBTM over streaming Chinese short texts

机译:流式中文短文本上扩展的LDA和IBTM的联合模型

获取原文
获取原文并翻译 | 示例
           

摘要

With the prevalent of short texts, discovering the topics within them has become an important task. Biterm Topic Model (BTM) is more suitable to discover topics on short texts than traditional topic models. However, there are still some challenges that dealing short texts with BTM will always ignore the document-topic semantic information and lack the true intentions of users. In addition, it is a static method and can not manage streaming short texts when a new one arrives immediately. In order to keep document-topic information and get the topic distribution of a new short text at once, we propose a joint model based on online algorithms of Latent Dirichlet Allocation (LDA) and BTM, which combines the merits of both models. Not only does it alleviate the sparsity when addressing short texts with the online algorithm of BTM, namely Incremental Biterm Topic Model (IBTM), but also keeps document-topic information with extended LDA. And considering the differences between English and Chinese text in writing, we use combined words in short texts as key words to extend the length of short texts and keep the true intensions of users. As shown in the experiment results on two real world datasets, our method is better than other baseline methods. In the end, we explain an application of our method in the task of discovering user interest tags.
机译:随着短文本的盛行,发现其中的主题已成为一项重要的任务。与传统主题模型相比,Biterm主题模型(BTM)更适合发现短文本上的主题。但是,仍然存在一些挑战,即使用BTM处理短文本将始终忽略文档主题的语义信息,并且缺乏用户的真实意图。另外,这是一种静态方法,当新的短文本立即到达时,它无法管理流短文本。为了保留文档主题信息并立即获得新短文的主题分布,我们提出了一种基于潜在狄利克雷分配(LDA)和BTM在线算法的联合模型,该模型结合了这两种模型的优点。它不仅减轻了BTM在线算法(即增量双向术语主题模型(IBTM))处理短文本时的稀疏性,而且还通过扩展的LDA保留了文档主题信息。并且考虑到英文和中文文本在写作上的差异,我们使用短文本中的组合词作为关键词,以延长短文本的长度并保持用户的真实意图。如两个真实数据集上的实验结果所示,我们的方法优于其他基准方法。最后,我们解释了我们的方法在发现用户兴趣标签的任务中的应用。

著录项

  • 来源
    《Intelligent data analysis》 |2019年第3期|681-699|共19页
  • 作者单位

    Tsinghua Univ, Dept Comp Sci & Technol, State Key Lab Intelligent Technol & Syst, Beijing 100084, Peoples R China|Hebei Univ Sci & Technol, Sch Informat Sci & Engn, Shijiazhuang 050018, Hebei, Peoples R China;

    Tsinghua Univ, Dept Comp Sci & Technol, State Key Lab Intelligent Technol & Syst, Beijing 100084, Peoples R China;

    Tsinghua Univ, Dept Comp Sci & Technol, State Key Lab Intelligent Technol & Syst, Beijing 100084, Peoples R China|Hebei Univ Sci & Technol, Sch Informat Sci & Engn, Shijiazhuang 050018, Hebei, Peoples R China;

    Jiangxi Salmon Technol Dev Co LTD, Nanchang 330013, Jiangxi, Peoples R China;

    Tsinghua Univ, Dept Comp Sci & Technol, State Key Lab Intelligent Technol & Syst, Beijing 100084, Peoples R China;

    Tsinghua Univ, Dept Comp Sci & Technol, State Key Lab Intelligent Technol & Syst, Beijing 100084, Peoples R China;

    Tsinghua Univ, Dept Comp Sci & Technol, State Key Lab Intelligent Technol & Syst, Beijing 100084, Peoples R China;

    Shijiazhuang Presch Teachers Coll, Shijiazhuang 050228, Hebei, Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Streaming chinese short text; topic discovery; topic models; online algorithms;

    机译:流媒体中文短文本;主题发现;主题模型;在线算法;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号