首页> 外文会议>Annual meeting of the Association for Computational Linguistics >On the Role of Seed Lexicons in Learning Bilingual Word Embeddings
【24h】

On the Role of Seed Lexicons in Learning Bilingual Word Embeddings

机译:种子词法在学习双语词嵌入中的作用

获取原文

摘要

A shared bilingual word embedding space (SBWES) is an indispensable resource in a variety of cross-language NLP and IR tasks. A common approach to the SBWES induction is to learn a mapping function between monolingual semantic spaces, where the mapping critically relies on a seed word lexicon used in the learning process. In this work, we analyze the importance and properties of seed lexicons for the SBWES induction across different dimensions (i.e., lexicon source, lexicon size, translation method, translation pair reliability). On the basis of our analysis, we propose a simple but effective hybrid bilingual word embedding (BWE) model. This model (HYBWE) learns the mapping between two monolingual embedding spaces using only highly reliable symmetric translation pairs from a seed document-level embedding space. We perform bilingual lexicon learning (BLL) with 3 language pairs and show that by carefully selecting reliable translation pairs our new HYBWE model outperforms benchmarking BWE learning models, all of which use more expensive bilingual signals. Effectively, we demonstrate that a SBWES may be induced by leveraging only a very weak bilingual signal (document alignments) along with monolingual data.
机译:共享的双语单词嵌入空间(SBWES)是各种跨语言NLP和IR任务中必不可少的资源。 SBWES归纳的一种常见方法是学习单语语义空间之间的映射功能,其中映射严格依赖于学习过程中使用的种子词词典。在这项工作中,我们分析了种子词典对于跨不同维度(即词典源,词典大小,翻译方法,翻译对可靠性)的SBWES归纳的重要性和性质。在我们的分析的基础上,我们提出了一个简单但有效的混合双语单词嵌入(BWE)模型。该模型(HYBWE)仅使用种子文档级嵌入空间中的高度可靠的对称翻译对来学习两个单语言嵌入空间之间的映射。我们使用3种语言对执行双语词典学习(BLL),并显示出通过仔细选择可靠的翻译对,我们的新HYBWE模型优于基准BWE学习模型,后者均使用更昂贵的双语信号。有效地,我们证明只有利用非常弱的双语信号(文档对齐)以及单语数据才能诱发SBWES。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号