首页> 外文期刊>Intelligent data analysis >Word co-occurrence augmented topic model in short text
【24h】

Word co-occurrence augmented topic model in short text

机译:短文本中的词共现增强主题模型

获取原文
获取原文并翻译 | 示例
       

摘要

The large amount of text on the Internet cause people hard to understand the meaning in a short limit time. Topic models (e.g. LDA and PLSA) have then been proposed to summarize the long text into several topic terms. In the recent years, the short text media such as Twitter is very popular. However, directly applying the transitional topic model on the short text corpus usually obtains non-coherent topics. It's because that there is no enough words to discover the word co-occurrence patterns in a short document. In this paper, we solve the problem of lack of the local word co-occurrence in LDA. Thus, we proposed an improvement of word co-occurrence method to enhance the topic models. We generate new virtual documents by re-organizing the words in documents and use it to enhance the traditional LDA. The experimental results show that our re-organized LDA (RO-LDA) method gets better results in the noisy Tweet dataset and the regular news dataset. Moreover, in our proposed augmented model, we do not need any external data. Our proposed methods are only based on the original topic model, thus our methods can easily apply to other existing LDA based models.
机译:Internet上的大量文本使人们难以在短时间内理解其含义。然后提出了主题模型(例如LDA和PLSA),以将长文本概括为几个主题词。近年来,诸如Twitter之类的短文本媒体非常受欢迎。但是,直接在短文本语料库上应用过渡主题模型通常会获得不连贯的主题。这是因为在简短的文档中没有足够的单词来发现单词共现模式。在本文中,我们解决了LDA中缺少本地单词共现的问题。因此,我们提出了一种改进的单词共现方法,以增强主题模型。我们通过重新组织文档中的单词来生成新的虚拟文档,并将其用于增强传统的LDA。实验结果表明,在嘈杂的Tweet数据集和常规新闻数据集中,我们的重组LDA(RO-LDA)方法获得了更好的结果。此外,在我们提出的扩充模型中,我们不需要任何外部数据。我们提出的方法仅基于原始主题模型,因此我们的方法可以轻松地应用于其他现有的基于LDA的模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号