首页> 外文会议>Pacific Asia Conference on Language, Information and Computation >Dealing with Out-Of-Vocabulary Problem in Sentence Alignment Using Word Similarity
【24h】

Dealing with Out-Of-Vocabulary Problem in Sentence Alignment Using Word Similarity

机译:使用单词相似度处理句子对齐中的词汇外问题

获取原文

摘要

Sentence alignment plays an essential role in building bilingual corpora which are valuable resources for many applications like statistical machine translation. In various approaches of sentence alignment, length-and-word-based methods which are based on sentence length and word correspondences have been shown to be the most effective. Nevertheless a drawback of using bilingual dictionaries trained by IBM Models in length-and-word-based methods is the problem of out-of-vocabulary (OOV). We propose using word similarity learned from monolingual corpora to overcome the problem. Experimental results showed that our method can reduce the OOV ratio and achieve a better performance than some other length-and-word-based methods. This implies that using word similarity learned from monolingual data may help to deal with OOV problem in sentence alignment.
机译:句子对齐在建立双语语料库方面起着至关重要的作用,而双语语料库对于许多应用(例如统计机器翻译)都是宝贵的资源。在句子对齐的各种方法中,基于句子长度和单词对应关系的基于长度和单词的方法已被证明是最有效的。但是,在基于长度和单词的方法中使用由IBM Models训练的双语词典的缺点是语音不足(OOV)问题。我们建议使用从单语语料库学到的单词相似性来解决该问题。实验结果表明,与其他基于长度和字的方法相比,该方法可以降低OOV比并获得更好的性能。这意味着使用从单语数据中学到的单词相似度可能有助于处理句子对齐中的OOV问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号