首页> 外文会议>Annual meeting of the Association for Computational Linguistics >Bilingual Lexicon Induction with Semi-supervision in Non-Isometric Embedding Spaces
【24h】

Bilingual Lexicon Induction with Semi-supervision in Non-Isometric Embedding Spaces

机译:非等距嵌入空间中具有半监督的双语词典归纳

获取原文

摘要

Recent work on bilingual lexicon induction (BLI) has frequently depended either on aligned bilingual lexicons or on distribution matching, often with an assumption about the isometry of the two spaces. We propose a technique to quantitatively estimate this assumption of the isometry between two embedding spaces and empirically show that this assumption weakens as the languages in question become increasingly etymologically distant. We then propose Bilingual Lexicon Induction with Semi-Supervision (BLISS) — a semi-supervised approach that relaxes the isometric assumption while leveraging both limited aligned bilingual lexicons and a larger set of unaligned word embeddings, as well as a novel hubness filtering technique. Our proposed method obtains state of the art results on 15 of 18 language pairs on the MUSE dataset, and does particularly well when the embedding spaces don't appear to be isometric. In addition, we also show that adding supervision stabilizes the learning procedure, and is effective even with minimal supervision.*
机译:双语词典归纳法(BLI)的最新工作通常依赖于对齐的双语词典或分布匹配,并且通常假设两个空间的等距。我们提出一种技术来定量估计两个嵌入空间之间的等距假设,并根据经验表明,随着所讨论语言在词源上的距离越来越远,该假设会减弱。然后,我们提出带有半监督的双语词典归纳法(BLISS)—一种半监督的方法,该方法放宽了等距假设,同时利用有限的对齐双语词典和较大的未对齐词嵌入集以及一种新颖的中心过滤技术。我们提出的方法在MUSE数据集上的18种语言对中的15种上获得了最先进的结果,并且在嵌入空间看起来不是等距的情况下效果特别好。此外,我们还表明,增加监督可以稳定学习过程,并且即使监督最少也有效。*

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号