Word co-occurrence augmented topic model in short text

Chen Guan-Bin; Kao Hung-Yu

首页> 外文期刊>Intelligent data analysis >Word co-occurrence augmented topic model in short text

【24h】

Word co-occurrence augmented topic model in short text

机译：短文本中的词共现增强主题模型

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The large amount of text on the Internet cause people hard to understand the meaning in a short limit time. Topic models (e.g. LDA and PLSA) have then been proposed to summarize the long text into several topic terms. In the recent years, the short text media such as Twitter is very popular. However, directly applying the transitional topic model on the short text corpus usually obtains non-coherent topics. It's because that there is no enough words to discover the word co-occurrence patterns in a short document. In this paper, we solve the problem of lack of the local word co-occurrence in LDA. Thus, we proposed an improvement of word co-occurrence method to enhance the topic models. We generate new virtual documents by re-organizing the words in documents and use it to enhance the traditional LDA. The experimental results show that our re-organized LDA (RO-LDA) method gets better results in the noisy Tweet dataset and the regular news dataset. Moreover, in our proposed augmented model, we do not need any external data. Our proposed methods are only based on the original topic model, thus our methods can easily apply to other existing LDA based models.

机译：Internet上的大量文本使人们难以在短时间内理解其含义。然后提出了主题模型（例如LDA和PLSA），以将长文本概括为几个主题词。近年来，诸如Twitter之类的短文本媒体非常受欢迎。但是，直接在短文本语料库上应用过渡主题模型通常会获得不连贯的主题。这是因为在简短的文档中没有足够的单词来发现单词共现模式。在本文中，我们解决了LDA中缺少本地单词共现的问题。因此，我们提出了一种改进的单词共现方法，以增强主题模型。我们通过重新组织文档中的单词来生成新的虚拟文档，并将其用于增强传统的LDA。实验结果表明，在嘈杂的Tweet数据集和常规新闻数据集中，我们的重组LDA（RO-LDA）方法获得了更好的结果。此外，在我们提出的扩充模型中，我们不需要任何外部数据。我们提出的方法仅基于原始主题模型，因此我们的方法可以轻松地应用于其他现有的基于LDA的模型。

著录项

来源
《Intelligent data analysis》 |2017年第suppla期|S55-S70|共16页
作者
Chen Guan-Bin; Kao Hung-Yu;
展开▼
作者单位

Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan, Taiwan;

Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan, Taiwan;

展开▼
收录信息美国《科学引文索引》(SCI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Short text; topic model; document clustering; document classification;

机译：短文本;主题模型;文档聚类;文档分类;
入库时间 2022-08-18 02:05:01

相似文献

外文文献
中文文献
专利

1. Relational Biterm Topic Model: Short-Text Topic Modeling using Word Embeddings [J] . Li Ximing, Zhang Ang, Li Changchun, The Computer journal . 2019,第3期

机译：关系双项主题模型：使用词嵌入的短文本主题建模
2. Topic Modeling for Short Texts via Word Embedding and Document Correlation [J] . Yi Feng, Jiang Bo, Wu Jianjun Quality Control, Transactions . 2020,第期

机译：通过Word嵌入和文档相关性为短文本建模主题建模
3. Incorporating word embeddings into topic modeling of short text [J] . Gao Wang, Peng Min, Wang Hua, Knowledge and information systems . 2019,第2期

机译：将Word Embeddings纳入了短文本的主题建模
4. Word Co-occurrence Augmented Topic Model in Short Text [C] . Guan-Bin Chen, Hung-Yu Kao Conference on Computational Linguistics and Speech Processing . 2015

机译：短文本中的词共现增强主题模型
5. Things and Strings and More: Improving Place Name Disambiguation from Short Texts by Combining Entity Co-Occurrence, Topic Modeling, and Word Embedding [D] . Ju, Yiting. 2017

机译：事物和字符串和更多：通过组合实体共同发生，主题建模和单词嵌入来改善从短文本的歧义
6. Predicting Lexical Norms: A Comparison between a Word Association Model and Text-Based Word Co-occurrence Models [O] . Hendrik Vankrunkelsven, Steven Verheyen, Gert Storms, 2018

机译：预测词法规范：单词联想模型与基于文本的单词共现模型之间的比较
7. Text Similarity Computing Based on LDA Topic Model and Word Co-occurrence [O] . Minglai Shao, Liangxi Qin 2014

机译：基于LDA主题模型的文本相似性计算与单词共同发生

Word co-occurrence augmented topic model in short text

摘要

著录项

相似文献

相关主题

期刊订阅