首页> 外文期刊>Computer speech and language >Learning English-Chinese bilingual word representations from sentence-aligned parallel corpus
【24h】

Learning English-Chinese bilingual word representations from sentence-aligned parallel corpus

机译:从句子对齐的并行语料库学习英语 - 中文双语词表示

获取原文
获取原文并翻译 | 示例
       

摘要

Representation of words in different languages is fundamental for various cross-lingual applications. In the past researches, there was an argument in using or not using word alignment in learning bilingual word representations. This paper presents a comprehensive empirical study on the uses of parallel corpus to learn the word representations in the embedding space. Various nonalignment and alignment approaches are explored to formulate the contexts for Skip-gram modeling. In the approaches without word alignment, concatenating A and B, concatenating B and A, interleaving A with B, shuffling A and B, and using A and B separately are considered, where A and B denote parallel sentences in two languages. In the approaches with word alignment, three word alignment tools, including GIZA++, TsinghuaAligner, and fast_align, are employed to align words in sentences A and B. The effects of alignment direction from A to B or from B to A are also discussed. To deal with the unaligned words in the word alignment approach, two alternatives, using the words aligned with their immediate neighbors and using the words in the interleaving approach, are explored. We evaluate the performance of the adopted approaches in four tasks, including bilingual dictionary induction, cross-lingual information retrieval, cross-lingual analogy reasoning, and cross-lingual word semantic relatedness. These tasks cover the issues of translation, reasoning, and information access. Experimental results show the word alignment approach with conditional interleaving achieves the best performance in most of the tasks. (C) 2019 Elsevier Ltd. All rights reserved.
机译:不同语言的单词的表示是各种交叉应用的基础。在过去的研究中,使用或不使用学习双语词表示中的字对齐的论据。本文提出了对并行语料库的用途的全面实证研究,以了解嵌入空间中的单词表示。探索各种非公共和对准方法以制定跳过革克建模的上下文。在没有词对准的方法中,考虑连接A和B,连接B和A,交错A,与B,Shuffling A和B,以及分别使用A和B,其中A和B以两种语言表示平行句子。在具有字对齐的方法中,使用包括Giza ++,Tsinghuaaligner和Fast_align的三个字对准工具,以将句子A和B的单词对齐。还讨论了对准方向的反向方向或来自B到A的效果。为了处理单词对齐方式中的未对齐方式,探讨了两个替代方案,使用与其直接邻居并使用交错方式中的单词对齐的单词。我们评估了在四个任务中采用了采用的方法的性能,包括双语词典感应,交叉语言信息检索,交叉语言类比推理和交叉词语语义相关性。这些任务涵盖了翻译,推理和信息访问的问题。实验结果表明,条件交织的词对齐方法实现了大多数任务中的最佳性能。 (c)2019 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号