Incorporating Word Embedding into Cross-Lingual Topic Modeling

机译：将单词嵌入纳入跨语言主题建模

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we address the cross-lingual topic modeling, which is an important technique that enables global enterprises to detect and compare topic trends across global markets. Previous works in cross-lingual topic modeling have proposed methods that utilize parallel or comparable corpus in constructing the polylingual topic model. However, parallel or comparable corpus in many cases are not available. In this research, we incorporate techniques of mapping cross-lingual word space and the topic modeling (LDA) and propose two methods: Translated Corpus with LDA (TC-LDA) and Post Match LDA (PM-LDA). The cross-lingual word space mapping allows us to compare words of different languages, and LDA enables us to group words into topics. Both TC-LDA and PM-LDA do not need parallel or comparable corpus and hence have more applicable domains. The effectiveness of both methods is evaluated using UM-Corpus and WS-353. Our evaluation results indicate that both methods are able to identify similar documents written in different language. In addition, PM-LDA is shown to achieve better performance than TC-LDA, especially when document length is short.

机译：在本文中，我们讨论了跨语言主题建模，这是使全球企业能够检测和比较全球市场主题趋势的一项重要技术。跨语言主题建模的先前工作提出了利用并行或可比语料库构建多语言主题模型的方法。但是，在许多情况下，并行或可比较的语料库不可用。在这项研究中，我们结合了跨语言单词空间的映射技术和主题建模（LDA），并提出了两种方法：带LDA的翻译语料库（TC-LDA）和赛后LDA（PM-LDA）。跨语言单词空间映射使我们能够比较不同语言的单词，而LDA使我们能够将单词分组为主题。 TC-LDA和PM-LDA都不需要并行或类似的语料库，因此具有更多适用域。使用UM-Corpus和WS-353评估了这两种方法的有效性。我们的评估结果表明，这两种方法都可以识别以不同语言编写的相似文档。此外，显示PM-LDA比TC-LDA具有更好的性能，尤其是在文档长度较短的情况下。

著录项

来源
《2018 IEEE International Congress on Big Data》|2018年|17-24|共8页
会议地点 San Francisco(US)
作者
Chia-Hsuan Chang; San-Yih Hwang; Tou-Hsiang Xui;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Dictionaries; Semantics; Market research; Social network services; Neural networks; Data models;

机译：词典;语义学;市场研究;社交网络服务;神经网络;数据模型;;

相似文献

外文文献
中文文献
专利

1. Incorporating word embeddings into topic modeling of short text [J] . Gao Wang, Peng Min, Wang Hua, Knowledge and information systems . 2019,第2期

机译：将Word Embeddings纳入了短文本的主题建模
2. A Survey of Cross-lingual Word Embedding Models [J] . Ruder Sebastian, Vulic Ivan, Sogaard Anders The Journal of Artificial Intelligence Research . 2019,第期

机译：跨语言嵌入模型的调查
3. Relational Biterm Topic Model: Short-Text Topic Modeling using Word Embeddings [J] . Li Ximing, Zhang Ang, Li Changchun, The Computer journal . 2019,第3期

机译：关系双项主题模型：使用词嵌入的短文本主题建模
4. Incorporating Word Embedding into Cross-Lingual Topic Modeling [C] . Chia-Hsuan Chang, San-Yih Hwang, Tou-Hsiang Xui IEEE International Congress on Big Data . 2018

机译：将单词嵌入到跨语言主题建模中
5. Multilingual model using cross-lingual word embeddings based on subword alignment and cross-task projection利用統計を見る [D] . Sakuma Jin 2019

机译：使用基于子词对齐和跨任务投影的跨语言词嵌入的多语言模型
6. Nonparametric Spherical Topic Modeling with Word Embeddings [O] . Kayhan Batmanghelich, Ardavan Saeedi, Karthik Narasimhan, -1

机译：带词嵌入的非参数球形主题建模
7. Topic Modeling over Short Texts by Incorporating Word Embeddings [O] . Qiang, Jipeng, Chen, Ping, Wang, Tong, 2016

机译：通过结合Word嵌入对短文本进行主题建模

Incorporating Word Embedding into Cross-Lingual Topic Modeling

摘要

著录项

相似文献

相关主题

期刊订阅