A Multilingual Topic Model for Learning Weighted Topic Links Across Corpora with Low Comparability

机译：具有低可比性跨学习加权主题链接的多语言主题模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Multilingual topic models (MTMs) learn topics on documents in multiple languages. Past models align topics across languages by implicitly assuming the documents in different languages are highly comparable, often a false assumption. We introduce a new model that does not rely on this assumption, particularly useful in important low-resource language scenarios. Our MTM learns weighted topic links and connects cross-lingual topics only when the dominant words defining them are similar, outperforming LDA and previous MTMs in classification tasks using documents' topic posteriors as features. It also learns coherent topics on documents with low comparability.

机译：多语言主题模型（MTMS）在多种语言中学习文档的主题。过去的模型通过隐式假设不同语言的文档来对齐跨语言的主题是非常可比的，通常是错误的假设。我们介绍了一个不依赖此假设的新模型，特别适用于重要的低资源语言情景。我们的MTM学习加权主题链接，并仅在定义它们的主导单词在分类任务中使用文档的主题后页中的定义单词和之前的MTMS作为特征时，只有当定义它们的主导单词和之前的MTMS时，才能连接到交叉语言。它还在具有低可比性的文档上学习相干主题。

著录项

来源
《International joint conference on natural language processing》|2019年|cxxxviii p. 649-1296|共6页
会议地点
作者
Weiwei Yang; Jordan Boyd-Graber; Philip Resnik;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Cross-language information retrieval models based on latent topic models trained with document-aligned comparable corpora [J] . Ivan Vulić, Wim De Smet, Marie-Francine Moens Information Retrieval . 2013,第3期

机译：基于潜在主题模型的跨语言信息检索模型，该主题模型经过与文档对齐的可比语料库训练
2. Cross-language information retrieval models based on latent topic models trained with document-aligned comparable corpora [J] . Ivan Vulic, Wim De Smet, Marie-Francine Moens Information retrieval . 2013,第3期

机译：基于潜在主题模型的跨语言信息检索模型，该主题模型经过与文档对齐的可比语料库训练
3. Overcoming Language Barriers: Assessing the Potential of Machine Translation and Topic Modeling for the Comparative Analysis of Multilingual Text Corpora [J] . Reber Ueli Communication Methods and Measures . 2019,第2期

机译：克服语言障碍：评估机器翻译和主题建模的潜力，以了解多语言文本语料库的比较分析
4. A Multilingual Topic Model for Learning Weighted Topic Links Across Corpora with Low Comparability [C] . Weiwei Yang, Jordan Boyd-Graber, Philip Resnik International joint conference on natural language processing;Conference on empirical methods in natural language processing . 2019

机译：用于学习语料库中具有较低可比性的加权主题链接的多语言主题模型
5. Topic Modeling of Hierarchical Corpora [D] . Kim, Do-kyum. 2014

机译：分层语料的主题建模
6. Zika discourse in the Americas: A multilingual topic analysis of Twitter [O] . Dasha Pruss, Yoshinari Fujinuma, Ashlynn R. Daughton, 2015

机译：美洲Zika话语：Twitter的多语言主题分析
7. Extracting multilingual topics from unaligned comparable corpora [O] . Jagadeesh Jagarlamudi, Hal Daumé III 2010

机译：从未对齐的可比语料库中提取多语言主题

A Multilingual Topic Model for Learning Weighted Topic Links Across Corpora with Low Comparability

摘要

著录项

相似文献

相关主题

期刊订阅