Unsupervised Topic Modeling for Short Texts Using Distributed Representations of Words

机译：使用分布式表示单词的短文本的无监督主题建模

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We present an unsupervised topic model for short texts that performs soft clustering over distributed representations of words. We model the low-dimensional semantic vector space represented by the dense distributed representations of words using Gaussian mixture models (GMMs) whose components capture the notion of latent topics. While conventional topic modeling schemes such as probabilistic latent semantic analysis (pLSA) and latent Dirich-let allocation (LDA) need aggregation of short messages to avoid data sparsity in short documents, our framework works on large amounts of raw short texts (billions of words). In contrast with other topic modeling frameworks that use word cooccurrence statistics, our framework uses a vector space model that overcomes the issue of sparse word co-occurrence patterns. We demonstrate that our framework outperforms LDA on short texts through both subjective and objective evaluation. We also show the utility of our framework in learning topics and classifying short texts on Twitter data for English, Spanish, French, Portuguese and Russian.

机译：我们为短文本提供了一个无人监督的主题模型，用于在单词的分布式表示中执行软群。我们使用高斯混合模型（GMMS）模拟了由Leussian混合模型（GMMS）的密集分布式表示的低维语义矢量空间，其组件捕获潜在主题的概念。虽然常规主题建模方案如概率潜在语义分析（PLSA）和潜在的Dirich-Let分配（LDA）需要聚合短消息，以避免短文档中的数据稀疏性，我们的框架在大量的原始短文本上工作（数十亿字）。与使用Word Cooccurrence统计数据的其他主题建模框架相比，我们的框架使用了一个克服了稀疏字共有模式问题的传染媒介空间模型。我们证明我们的框架通过主观和客观评估来表明我们的框架在短篇文本上表现出LDA。我们还展示了我们在学习主题和对英语，西班牙语，法语，葡萄牙语和俄语的Twitter数据上进行了分类的简短文本的效用。

著录项

来源
《Workshop on vector space Modeling for Natural Language Processing》|2015年||共9页
会议地点
作者
Vivek Kumar Rangarajan Sridhar;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类线性空间理论（向量空间）;
关键词

相似文献

外文文献
中文文献
专利

1. Improving Distributed Word Representation and Topic Model by Word-Topic Mixture Model [J] . Xianghua Fu, Ting Wang, Jing Li, JMLR: Workshop and Conference Proceedings . 2017,第1期

机译：通过词-主题混合模型改进分布式词表示和主题模型
2. Improving short text classification by learning vector representations of both words and hidden topics [J] . Zhang Heng, Zhong Guoqiang Knowledge-Based Systems . 2016,第juna15期

机译：通过学习单词和隐藏主题的向量表示来改善短文本分类
3. Relational Biterm Topic Model: Short-Text Topic Modeling using Word Embeddings [J] . Li Ximing, Zhang Ang, Li Changchun, The Computer journal . 2019,第3期

机译：关系双项主题模型：使用词嵌入的短文本主题建模
4. Unsupervised Topic Modeling for Short Texts Using Distributed Representations of Words [C] . Vivek Kumar Rangarajan Sridhar 1st Workshop on vector space Modeling for Natural Language Processing 2015 . 2015

机译：使用单词的分布式表示形式的短文本的无监督主题建模
5. Things and Strings and More: Improving Place Name Disambiguation from Short Texts by Combining Entity Co-Occurrence, Topic Modeling, and Word Embedding [D] . Ju, Yiting. 2017

机译：事物和字符串和更多：通过组合实体共同发生，主题建模和单词嵌入来改善从短文本的歧义
6. Unsupervised Topic Modeling in a Large Free Text Radiology Report Repository [O] . Saeed Hassanpour, Curtis P. Langlotz 2016

机译：大型自由文本放射学报告资料库中的无监督主题建模
7. Vector Representation of Words for Detecting Topic Trends over Short Texts [O] . Liyan He, Yajun Du, Lei Zhang 2018

机译：导航侦查主题趋势的词的传染媒介表示在短篇文本

Unsupervised Topic Modeling for Short Texts Using Distributed Representations of Words

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅