首页> 外文会议>Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies >Bilingual lexicon extraction for a distant language pair using a small parallel corpus
【24h】

Bilingual lexicon extraction for a distant language pair using a small parallel corpus

机译:使用小并行语料库的双语词典提取遥远的语言对

获取原文

摘要

The aim of this thesis proposal is to perform bilingual lexicon extraction for cases in which small parallel corpora are available and it is not easy to obtain monolingual corpus for at least one of the languages. Moreover, the languages are typologically distant and there is no bilingual seed lexicon available. We focus on the language pair Spanish-Nahuatl, we propose to work with morpheme based representations in order to reduce the sparseness and to facilitate the task of finding lexical correspondences between a highly agglutinative language and a fusional one. We take into account contextual information but instead of using a precompiled seed dictionary, we use the distribution and dispersion of the positions of the morphological units as cues to compare the contextual vectors and obtaining the translation candidates.
机译:本文提出的目的是为其中有小并行基层提供的案例进行双语词汇提取,并且不容易获得至少一种语言的单声道语料库。此外,语言是什么类型的遥远,没有双语种子词典。我们专注于语言对西班牙语 - Nahuatl,我们建议使用基于语素的代表,以减少稀疏性,并促进在高凝集语言和忠实的诽谤之间找到词汇对应的任务。我们考虑了上下文信息,而不是使用预编译的种子字典,我们使用形态单位的位置的分布和分散作为提示,以比较上下文向量并获得翻译候选者。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号