首页> 外文会议>Annual meeting of the Association for Computational Linguistics >Bilingual Lexicon Induction through Unsupervised Machine Translation
【24h】

Bilingual Lexicon Induction through Unsupervised Machine Translation

机译:通过无监督机器翻译进行双语词典归纳

获取原文

摘要

A recent research line has obtained strong results on bilingual lexicon induction by aligning independently trained word embeddings in two languages and using the resulting cross-lingual embeddings to induce word translation pairs through nearest neighbor or related retrieval methods. In this paper, we propose an alternative approach to this problem that builds on the recent work on unsupervised machine translation. This way, instead of directly inducing a bilingual lexicon from cross-lingual embeddings, we use them to build a phrase-table, combine it with a language model, and use the resulting machine translation system to generate a synthetic parallel corpus, from which we extract the bilingual lexicon using statistical word alignment techniques. As such, our method can work with any word embedding and cross-lingual mapping technique, and it does not require any additional resource besides the monolingual corpus used to train the embeddings. When evaluated on the exact same cross-lingual embeddings, our proposed method obtains an average improvement of 6 accuracy points over nearest neighbor and 4 points over CSLS retrieval, establishing a new state-of-the-art in the standard MUSE dataset.
机译:最近的研究线通过将经过独立训练的两种语言的单词嵌入对齐,并使用所得的跨语言嵌入通过最近的邻居或相关检索方法来诱导单词翻译对,从而在双语词典的归纳中获得了强有力的结果。在本文中,我们提出了一种针对此问题的替代方法,该方法基于无监督机器翻译的最新工作。这样,我们不是直接从跨语言嵌入中引出双语词典,而是使用它们来构建短语表,将其与语言模型结合起来,并使用生成的机器翻译系统生成合成的并行语料库,从中使用统计词对齐技术提取双语词典。这样,我们的方法可以与任何词嵌入和跨语言映射技术一起使用,并且除了用于训练嵌入的单语语料库之外,不需要任何其他资源。当对完全相同的跨语言嵌入进行评估时,我们提出的方法比最近的邻居平均提高了6个精度点,比CSLS检索平均提高了4个点,从而在标准MUSE数据集中建立了新的技术水平。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号