首页> 外文会议>Annual meeting of the Association for Computational Linguistics >On the Limitations of Unsupervised Bilingual Dictionary Induction
【24h】

On the Limitations of Unsupervised Bilingual Dictionary Induction

机译:论无监督双语词典归纳法的局限性

获取原文

摘要

Unsupervised machine translation-i.e., not assuming any cross-lingual supervision signal, whether a dictionary, translations, or comparable corpora-seems impossible, but nevertheless, Lample et al. (2018a) recently proposed a fully unsupervised machine translation (MT) model. The model relies heavily on an adversarial, unsupervised alignment of word embedding spaces for bilingual dictionary induction (Conneau et al., 2018), which we examine here. Our results identify the limitations of current unsupervised MT: unsupervised bilingual dictionary induction performs much worse on morphologically rich languages that are not dependent marking, when monolingual corpora from different domains or different embedding algorithms are used. We show that a simple trick, exploiting a weak supervision signal from identical words, enables more robust induction, and establish a near-perfect correlation between unsupervised bilingual dictionary induction performance and a previously unexplored graph similarity metric.
机译:无监督的机器翻译-即不假设任何跨语言的监督信号,无论是字典,翻译还是类似的语料库-看起来都是不可能的,但是Lample等人。 (2018a)最近提出了一种完全无监督的机器翻译(MT)模型。该模型在很大程度上依赖于单词嵌入空间的对抗性,无监督对齐以进行双语词典归纳(Conneau et al。,2018),我们在这里进行了研究。我们的结果确定了当前无监督MT的局限性:当使用来自不同领域或不同嵌入算法的单语语料库时,无监督双语词典归纳在形态丰富的语言上的表现要差得多,这些语言不是依赖标记。我们显示了一个简单的技巧,即利用来自相同单词的弱监督信号,使归纳更加鲁棒,并在无监督的双语词典归纳性能与先前未探索的图相似性度量之间建立近乎完美的相关性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号