Incorporating Word Embedding into Cross-Lingual Topic Modeling

机译：将单词嵌入到跨语言主题建模中

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we address the cross-lingual topic modeling, which is an important technique that enables global enterprises to detect and compare topic trends across global markets. Previous works in cross-lingual topic modeling have proposed methods that utilize parallel or comparable corpus in constructing the polylingual topic model. However, parallel or comparable corpus in many cases are not available. In this research, we incorporate techniques of mapping cross-lingual word space and the topic modeling (LDA) and propose two methods: Translated Corpus with LDA (TC-LDA) and Post Match LDA (PM-LDA). The cross-lingual word space mapping allows us to compare words of different languages, and LDA enables us to group words into topics. Both TC-LDA and PM-LDA do not need parallel or comparable corpus and hence have more applicable domains. The effectiveness of both methods is evaluated using UM-Corpus and WS-353. Our evaluation results indicate that both methods are able to identify similar documents written in different language. In addition, PM-LDA is shown to achieve better performance than TC-LDA, especially when document length is short.

机译：在本文中，我们解决了交叉语言主题建模，这是一种重要的技术，使全球企业能够检测和比较全球市场的主题趋势。以前的跨语言主题建模的作品已经提出了利用平行或可比语料库构建折叠主题模型的方法。然而，许多情况下并行或可比语料库不可用。在这项研究中，我们纳入了映射跨语言词空间和主题建模（LDA）的技术，并提出了两种方法：用LDA（TC-LDA）翻译语料库和匹配LDA（PM-LDA）。交叉语言单词空间映射允许我们比较不同语言的语言，而LDA使我们能够将单词分组为主题。 TC-LDA和PM-LDA都不需要并行或可比语料库，因此具有更适用的域。使用UM-CORPU和WS-353评估两种方法的有效性。我们的评估结果表明，两种方法都能够识别以不同语言编写的类似文件。此外，PM-LDA显示用于达到比TC-LDA更好的性能，尤其是当文档长度短时。

著录项

来源
《IEEE International Congress on Big Data》|2018年|1 v.|共8页
会议地点
作者
Chia-Hsuan Chang; San-Yih Hwang; Tou-Hsiang Xui;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词
Dictionaries; Semantics; Market research; Social network services; Neural networks; Data models;

机译：词典;语义;市场研究;社交网络服务;神经网络;数据模型;

相似文献

外文文献
中文文献
专利

1. Incorporating word embeddings into topic modeling of short text [J] . Gao Wang, Peng Min, Wang Hua, Knowledge and information systems . 2019,第2期

机译：将Word Embeddings纳入了短文本的主题建模
2. A Survey of Cross-lingual Word Embedding Models [J] . Ruder Sebastian, Vulic Ivan, Sogaard Anders The Journal of Artificial Intelligence Research . 2019,第期

机译：跨语言嵌入模型的调查
3. Relational Biterm Topic Model: Short-Text Topic Modeling using Word Embeddings [J] . Li Ximing, Zhang Ang, Li Changchun, The Computer journal . 2019,第3期

机译：关系双项主题模型：使用词嵌入的短文本主题建模
4. Incorporating Word Embedding into Cross-Lingual Topic Modeling [C] . Chia-Hsuan Chang, San-Yih Hwang, Tou-Hsiang Xui 2018 IEEE International Congress on Big Data . 2018

机译：将单词嵌入纳入跨语言主题建模
5. Multilingual model using cross-lingual word embeddings based on subword alignment and cross-task projection利用統計を見る [D] . Sakuma Jin 2019

机译：使用基于子词对齐和跨任务投影的跨语言词嵌入的多语言模型
6. Nonparametric Spherical Topic Modeling with Word Embeddings [O] . Kayhan Batmanghelich, Ardavan Saeedi, Karthik Narasimhan, -1

机译：带词嵌入的非参数球形主题建模
7. Topic Modeling over Short Texts by Incorporating Word Embeddings [O] . Qiang, Jipeng, Chen, Ping, Wang, Tong, 2016

机译：通过结合Word嵌入对短文本进行主题建模

Incorporating Word Embedding into Cross-Lingual Topic Modeling

摘要

著录项

相似文献

相关主题

期刊订阅