首页> 外文会议>Annual conference on Neural Information Processing Systems >Symmetric Correspondence Topic Models for Multilingual Text Analysis
【24h】

Symmetric Correspondence Topic Models for Multilingual Text Analysis

机译:多语言文本分析的对称对应主题模型

获取原文
获取外文期刊封面目录资料

摘要

Topic modeling is a widely used approach to analyzing large text collections. A small number of multilingual topic models have recently been explored to discover latent topics among parallel or comparable documents, such as in Wikipedia. Other topic models that were originally proposed for structured data are also applicable to multilingual documents. Correspondence Latent Dirichlet Allocation (CorrLDA) is one such model; however, it requires a pivot language to be specified in advance. We propose a new topic model, Symmetric Correspondence LDA (SymCorrLDA), that incorporates a hidden variable to control a pivot language, in an extension of CorrLDA. We experimented with two multilingual comparable datasets extracted from Wikipedia and demonstrate that SymCorrLDA is more effective than some other existing multilingual topic models.
机译:主题建模是一种用于分析大型文本集合的广泛使用的方法。最近已经探索了几种多语言主题模型,以发现并行或类似文档(例如Wikipedia)中的潜在主题。最初建议用于结构化数据的其他主题模型也适用于多语言文档。对应的潜在狄利克雷分配(CorrLDA)就是这样一种模型。但是,它需要预先指定一种枢轴语言。我们提出了一个新的主题模型,对称通信LDA(SymCorrLDA),该模型在CorrLDA的扩展中合并了一个隐藏变量以控制枢轴语言。我们对从Wikipedia提取的两个多语言可比较数据集进行了实验,并证明了SymCorrLDA比其他一些现有的多语言主题模型更有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号