首页> 外文会议>13th Conference of the European Chapter of the Association for Computational Linguistics 2012. >Detecting Highly Confident Word Translations from Comparable Corpora without Any Prior Knowledge
【24h】

Detecting Highly Confident Word Translations from Comparable Corpora without Any Prior Knowledge

机译:在没有任何先验知识的情况下从可比语料库中检测高度自信的单词翻译

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we extend the work on using latent cross-language topic models for identifying word translations across comparable corpora. We present a novel precision-oriented algorithm that relies on per-topic word distributions obtained by the bilingual LDA (BiLDA) latent topic model. The algorithm aims at harvesting only the most probable word translations across languages in a greedy fashion, without any prior knowledge about the language pair, relying on a symmetrization process and the one-to-one constraint. We report our results for Italian-English and Dutch-English language pairs that outperform the current state-of-the-art results by a significant margin. In addition, we show how to use the algorithm for the construction of high-quality initial seed lexicons of translations.
机译:在本文中,我们扩展了使用潜在的跨语言主题模型来识别可比语料库中的单词翻译的工作。我们提出了一种新颖的面向精度的算法,该算法依赖于通过双语LDA(BiLDA)潜在主题模型获得的每个主题的单词分布。该算法旨在以贪婪的方式仅跨语言获取最可能的单词翻译,而无需任何有关语言对的先验知识,这取决于对称化过程和一对一的约束。我们报告了意大利语-英语和荷兰语-英语对的搜索结果,这些搜索结果明显优于当前最新的搜索结果。另外,我们展示了如何使用该算法来构建高质量的初始种子词典。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号