C-BiLDA extracting cross-lingual topics from non-parallel texts by distinguishing shared from unshared content

Heyman Geert; Vulic Ivan; Moens Marie-Francine

首页> 外文期刊>Data mining and knowledge discovery >C-BiLDA extracting cross-lingual topics from non-parallel texts by distinguishing shared from unshared content

【24h】

C-BiLDA extracting cross-lingual topics from non-parallel texts by distinguishing shared from unshared content

机译：C-BiLDA通过区分共享内容和非共享内容，从非平行文本中提取跨语言主题

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We study the problem of extracting cross-lingual topics from non-parallel multilingual text datasets with partially overlapping thematic content (e.g., aligned Wikipedia articles in two different languages). To this end, we develop a new bilingual probabilistic topic model called comparable bilingual latent Dirichlet allocation (C-BiLDA), which is able to deal with such comparable data, and, unlike the standard bilingual LDA model (BiLDA), does not assume the availability of document pairs with identical topic distributions. We present a full overview of C-BiLDA, and show its utility in the task of cross-lingual knowledge transfer for multi-class document classification on two benchmarking datasets for three language pairs. The proposed model outperforms the baseline LDA model, as well as the standard BiLDA model and two standard low-rank approximation methods (CL-LSI and CL-KCCA) used in previous work on this task.

机译：我们研究了从主题内容部分重叠的非平行多语言文本数据集中提取跨语言主题的问题（例如，两种不同语言的对齐Wikipedia文章）。为此，我们开发了一个新的双语概率主题模型，称为可比双语潜在Dirichlet分配（C-BiLDA），它能够处理此类可比数据，并且与标准双语LDA模型（BiLDA）不同，它不假定具有相同主题分布的文档对的可用性。我们提供C-BiLDA的完整概述，并显示其在跨语言知识转移任务中的实用性，该知识用于针对三个语言对的两个基准数据集进行多类文档分类。所提出的模型优于基线LDA模型，标准BiLDA模型以及先前在完成此任务时使用的两种标准低秩近似方法（CL-LSI和CL-KCCA）。

著录项

来源
《Data mining and knowledge discovery》 |2016年第5期|共25页
作者
Heyman Geert; Vulic Ivan; Moens Marie-Francine;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Cross-lingual text mining; Multilingual topic modeling; Multilinguality; Comparable data; Cross-lingual knowledge transfer; Unsupervised modeling of text data; Representation learning;

机译：跨语言文本挖掘;多语言主题建模;多语言;可比数据;跨语言知识转移;无监督文本数据建模;表示学习;

相似文献

外文文献
中文文献
专利

1. C-BiLDA extracting cross-lingual topics from non-parallel texts by distinguishing shared from unshared content [J] . Heyman Geert, Vulic Ivan, Moens Marie-Francine Data mining and knowledge discovery . 2016,第5期

机译：C-BiLDA通过区分共享内容和非共享内容，从非平行文本中提取跨语言主题
2. Extracting topic-sensitive content from textual documents-A hybrid topic model approach [J] . Yan Liang, Ying Liu, Chong Chen, Engineering Applications of Artificial Intelligence . 2018,第APRa期

机译：从文本文档中提取主题敏感内容-一种混合主题模型方法
3. An extractive text summarization approach using tagged-LDA based topic modeling [J] . Ruby Rani, D. K. Lobiyal Multimedia Tools and Applications . 2021,第3期

机译：基于标记-LDA主题建模的提取文本摘要方法
4. Labeled Bilingual Topic Model for Cross-Lingual Text Classification and Label Recommendation [C] . Ming-Jie Tian, Zheng-Hao Huang, Rong-Yi Cui International Conference on Information Science and Control Engineering . 2018

机译：跨语言文本分类和标签推荐的标签双语主题模型
5. Opinion topic, holder and polarity in texts: Exploration and automatic identification from cross-lingual data [D] . Kim, Kyoung-Young. 2011

机译：意见主题，文本的持有人和极性：跨语言数据的探索和自动识别
6. C-BiLDA extracting cross-lingual topics from non-parallel texts by distinguishing shared from unshared content [O] . Heyman Geert, Vulic Ivan, Moens Marie-Francine 2016

机译：C-BiLDA通过区分共享内容和非共享内容，从非并行文本中提取跨语言主题

C-BiLDA extracting cross-lingual topics from non-parallel texts by distinguishing shared from unshared content

摘要

著录项

相似文献

相关主题

期刊订阅