Topic Models Incorporating Statistical Word Senses

机译：包含统计单词感官的主题模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

LDA considers a surface word to be identical across all documents and measures the contribution of a surface word to each topic. However, a surface word may present different signatures in different contexts, i.e. polysemous words can be used with different senses in different contexts. Intuitively, disambiguating word senses for topic models can enhance their discriminative capabilities. In this work, we propose a joint model to automatically induce document topics and word senses simultaneously. Instead of using some pre-defined word sense resources, we capture the word sense information via a latent variable and directly induce them in a fully unsupervised manner from the corpora. Experimental results show that the proposed joint model outperforms the classic LDA and a standalone sense-based LDA model significantly in document clustering.

机译：LDA认为横跨所有文档相同的表面单词，并测量表面单词对每个主题的贡献。然而，表面词可以在不同的上下文中呈现不同的签名，即可以在不同的上下文中与不同的感官一起使用。直观地，对主题模型的歧义词感觉可以增强它们的鉴别能力。在这项工作中，我们提出了一个联合模型，以便同时诱导文档主题和单词感官。我们不是使用一些预定义的词感测资源，我们通过潜在的变量捕获字感测信息，并直接从语料库中以完全无监督的方式诱导它们。实验结果表明，在文档聚类中，该联合模型显着优于经典LDA和基于独立的Sense-LDA模型。

著录项

来源
《Conference on Intelligent Text Processing and Computational Linguistics;CICLing 2014》|2014年||共12页
会议地点
作者
Guoyu Tang; Yunqing Xia; Jun Sun; Min Zhang; Thomas Fang Zheng;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-532;
关键词
topic modeling; word sense induction; document representation; document clustering;

机译：主题建模;词感应归纳;文档表示;文档聚类;

相似文献

外文文献
中文文献
专利

1. Statistical word sense aware topic models [J] . Tang Guoyu, Xia Yunqing, Sun Jun, Soft computing: A fusion of foundations, methodologies and applications . 2015,第1期

机译：统计词感知意识主题模型
2. A novel topic model for documents by incorporating semantic relations between words [J] . Soft computing: A fusion of foundations, methodologies and applications . 2020,第15期

机译：通过结合单词之间的语义关系的文档的新颖主题模型
3. Incorporating word embeddings into topic modeling of short text [J] . Gao Wang, Peng Min, Wang Hua, Knowledge and information systems . 2019,第2期

机译：将Word Embeddings纳入了短文本的主题建模
4. Topic Models Incorporating Statistical Word Senses [C] . Guoyu Tang, Yunqing Xia, Jun Sun, International conference on intelligent text processing and computational linguistics . 2014

机译：包含统计词义的主题模型
5. Topics in Statistical Modeling for Unstructured Text Data with Application to Commonsense Inference [D] . Yang, Yiben. 2020

机译：非结构化文本数据的统计建模主题与应用到致辞引用
6. Incorporating Statistical Topic Models in the Retrieval of Healthcare Documents [O] . Karla Caballero, Ram Akella 2015

机译：在医疗文档检索中纳入统计主题模型
7. Learning Word Sense Distributions, Detecting Unattested Senses and Identifying Novel Senses Using Topic Models [O] . Jey Han Lau, Paul Cook, Diana Mccarthy, 2014

机译：学习单词感觉分布，检测未经证实的感官并使用主题模型识别新感官

Topic Models Incorporating Statistical Word Senses

摘要

著录项

相似文献

相关主题

期刊订阅