On the Limitations of Unsupervised Bilingual Dictionary Induction

机译：论无监督双语词典归纳法的局限性

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Unsupervised machine translation-i.e., not assuming any cross-lingual supervision signal, whether a dictionary, translations, or comparable corpora-seems impossible, but nevertheless, Lample et al. (2018a) recently proposed a fully unsupervised machine translation (MT) model. The model relies heavily on an adversarial, unsupervised alignment of word embedding spaces for bilingual dictionary induction (Conneau et al., 2018), which we examine here. Our results identify the limitations of current unsupervised MT: unsupervised bilingual dictionary induction performs much worse on morphologically rich languages that are not dependent marking, when monolingual corpora from different domains or different embedding algorithms are used. We show that a simple trick, exploiting a weak supervision signal from identical words, enables more robust induction, and establish a near-perfect correlation between unsupervised bilingual dictionary induction performance and a previously unexplored graph similarity metric.

机译：无监督的机器翻译-即不假设任何跨语言的监督信号，无论是字典，翻译还是类似的语料库-看起来都是不可能的，但是Lample等人。（2018a）最近提出了一种完全无监督的机器翻译（MT）模型。该模型在很大程度上依赖于单词嵌入空间的对抗性，无监督对齐以进行双语词典归纳（Conneau et al。，2018），我们在这里进行了研究。我们的结果确定了当前无监督MT的局限性：当使用来自不同领域或不同嵌入算法的单语语料库时，无监督双语词典归纳在形态丰富的语言上的表现要差得多，这些语言不是依赖标记。我们显示了一个简单的技巧，即利用来自相同单词的弱监督信号，使归纳更加鲁棒，并在无监督的双语词典归纳性能与先前未探索的图相似性度量之间建立近乎完美的相关性。

著录项

来源
《Annual meeting of the Association for Computational Linguistics》|2018年|778-788|共11页
会议地点
作者
Anders Sogaard; Sebastian Ruder; Ivan Vulic;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Plan Optimization to Bilingual Dictionary Induction for Low-resource Language Families [J] . Nasution Arbi Haza, Murakami Yohei, Ishida Toru ACM transactions on Asian and low-resource language information processing . 2021,第2期

机译：计划优化低资源语言系列的双语词典归纳
2. A Generalized Constraint Approach to Bilingual Dictionary Induction for Low-Resource Language Families [J] . Nasution Arbi Haza, Murakami Yohei, Ishida Toru ACM transactions on Asian language information processing . 2018,第2期

机译：低资源语言家庭双语词典归纳的广义约束方法
3. A Constraint Approach to Pivot-Based Bilingual Dictionary Induction [J] . MAIRIDAN WUSHOUER, DONGHUI LIN, TORU ISHIDA, ACM transactions on Asian language information processing . 2016,第1期

机译：基于透视的双语词典归纳的一种约束方法
4. On the Limitations of Unsupervised Bilingual Dictionary Induction [C] . Anders Sogaard, Sebastian Ruder, Ivan Vulic Annual meeting of the Association for Computational Linguistics . 2018

机译：关于无监督双语词典归纳的局限性
5. Automatic extraction of lemma-based bilingual dictionaries for morphologically rich languages [D] . Saleh, Ibrahim Mohamed Hassan 2009

机译：自动提取基于词素的双语词典，用于丰富形态的语言
6. Dictionary learning for unsupervised identification of ischemic territories in CP-BOLD Cardiac MRI at rest [O] . Marco Bevilacqua, Cristian Rusu, Rohan Dharmakumar, 2015

机译：字典学习可在静息状态下CP-BOLD心脏MRI中无监督地识别缺血区域
7. A Generalized Constraint Approach to Bilingual Dictionary Induction for Low-Resource Language Families [O] . Arbi Haza Nasution, Yohei Murakami, Toru Ishida 2018

机译：低资源语言系列双语词典归纳的广义约束方法

On the Limitations of Unsupervised Bilingual Dictionary Induction

摘要

著录项

相似文献

相关主题

期刊订阅