首页> 外文期刊>Computer speech and language >Learning English-Chinese bilingual word representations from sentence-aligned parallel corpus
【24h】

Learning English-Chinese bilingual word representations from sentence-aligned parallel corpus

机译:从句子对齐的平行语料库中学习英汉双语单词表示

获取原文
获取原文并翻译 | 示例

摘要

Representation of words in different languages is fundamental for various cross-lingual applications. In the past researches, there was an argument in using or not using word alignment in learning bilingual word representations. This paper presents a comprehensive empirical study on the uses of parallel corpus to learn the word representations in the embedding space. Various nonalignment and alignment approaches are explored to formulate the contexts for Skip-gram modeling. In the approaches without word alignment, concatenating A and B, concatenating B and A, interleaving A with B, shuffling A and B, and using A and B separately are considered, where A and B denote parallel sentences in two languages. In the approaches with word alignment, three word alignment tools, including GIZA++, TsinghuaAligner, and fast_align, are employed to align words in sentences A and B. The effects of alignment direction from A to B or from B to A are also discussed. To deal with the unaligned words in the word alignment approach, two alternatives, using the words aligned with their immediate neighbors and using the words in the interleaving approach, are explored. We evaluate the performance of the adopted approaches in four tasks, including bilingual dictionary induction, cross-lingual information retrieval, cross-lingual analogy reasoning, and cross-lingual word semantic relatedness. These tasks cover the issues of translation, reasoning, and information access. Experimental results show the word alignment approach with conditional interleaving achieves the best performance in most of the tasks. (C) 2019 Elsevier Ltd. All rights reserved.
机译:不同语言中的单词表示对于各种跨语言应用程序来说是基础。在过去的研究中,有人争论在学习双语单词表示中是否使用单词对齐。本文对使用平行语料库来学习嵌入空间中的单词表示形式进行了全面的实证研究。探索了各种非比对和比对方法来制定用于Skip-gram建模的上下文。在没有单词对齐的方法中,考虑了将A和B串联,将B和A串联,将A与B交织,将A和B改组以及分别使用A和B,其中A和B表示两种语言的并行句子。在单词对齐方法中,使用了GIZA ++,TsinghuaAligner和fast_align这三个单词对齐工具来对齐句子A和B中的单词。还讨论了对齐方向从A到B或从B到A的效果。为了在单词对齐方法中处理未对齐的单词,探索了两种选择,即使用与其直接邻居对齐的单词和在交织方法中使用单词。我们在四个任务中评估了所采用方法的性能,包括双语词典归纳,跨语言信息检索,跨语言类比推理和跨语言单词语义相关性。这些任务涵盖翻译,推理和信息访问的问题。实验结果表明,带有条件交织的单词对齐方法在大多数任务中均能达到最佳性能。 (C)2019 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号