首页> 外文期刊>International journal of computing science and mathematics >Khmer-Chinese bilingual LDA topic model based on dictionary
【24h】

Khmer-Chinese bilingual LDA topic model based on dictionary

机译:基于字典的高棉双语LDA主题模型

获取原文
获取原文并翻译 | 示例
       

摘要

Multilingual probabilistic topic models have been widely used in topic of mining area in multilingual documents, this paper proposes the Khmer-Chinese bilingual latent Dirichlet allocation (KCB-LDA) model based on the bilingual dictionary. With the bilingual attribute of entries in dictionary, this method first maps the words expressing same semantic meaning to the concept abstract layer, then group concepts into the same topic space. Finally, documents in different languages will share the same latent topics. The same topics can be represented in both Chinese and Khmer jointly when given a bilingual corpus by the introduction of the concept layer. The experimental results show that our topic modelling approach has better predictive power.
机译:多语言概率主题模型已在多语言文档的采矿领域中得到广泛应用,本文提出了基于双语字典的高棉-汉双语潜在狄利克雷分配(KCB-LDA)模型。利用字典中条目的双语属性,该方法首先将表示相同语义的单词映射到概念抽象层,然后将概念分组到相同的主题空间中。最后,不同语言的文档将共享相同的潜在主题。通过引入概念层,在给予双语语料库的情况下,相同的主题可以用中文和高棉语共同表示。实验结果表明,我们的主题建模方法具有更好的预测能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号