首页> 外文会议>International conference on neural information processing >Bilingual Lexicon Extraction with Forced Correlation from Comparable Corpora
【24h】

Bilingual Lexicon Extraction with Forced Correlation from Comparable Corpora

机译:可比语料库中具有强制相关的双语词典提取

获取原文

摘要

Recently a simple linear transformation with word embedding has been found to be highly effective to extract a bilingual lexicon from comparable corpora. However, the pairs of bilingual word embedding for training this transformation are assumed to satisfy a linear relationship automatically which actually can't be guaranteed absolutely in practice. This paper proposes a simple solution based on canonical correlation analysis (CCA) which forces the bilingual word embedding for training the transformation to be maximally linearly correlated onto the projection subspaces. After projecting the original word embedding into the new correlation subspace in two languages, a better transformation matrix is again learned with the new projected word embeddings as before. The experimental results confirm that the proposed solution can achieve a significant improvement of 62% in the precision at Top-1 over the baseline approach on the English-to-Chinese bilingual lexicon extraction task.
机译:最近,发现具有单词嵌入的简单线性变换对于从可比语料库中提取双语词典非常有效。然而,用于训练该变换的一对双语单词嵌入被假定为自动满足线性关系,这实际上在实践中是绝对不能保证的。本文提出了一种基于规范相关分析(CCA)的简单解决方案,该解决方案要求将用于训练变换的双语词嵌入最大程度地线性关联到投影子空间上。用两种语言将原始单词嵌入到新的相关子空间中后,通过使用新的投射单词嵌入,再次学习到更好的转换矩阵。实验结果证实,所提出的解决方案与英语-中文双语词典提取任务的基线方法相比,在Top-1上的精度可显着提高62%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号