首页> 外文会议>Workshop on natural language processing 2014 >Bilingual Sentence Alignment of a Parallel Corpus by Using English as a Pivot Language
【24h】

Bilingual Sentence Alignment of a Parallel Corpus by Using English as a Pivot Language

机译:以英语为枢轴语言的平行语料库双语句子对齐

获取原文
获取原文并翻译 | 示例

摘要

Statistically training a machine translation model requires a parallel corpus containing a huge amount of aligned sentence pairs in both languages. However, it is not easy to obtain such a corpus when English is not the source or the target language. The European Parliament parallel corpus contains only English sentence alignments with 20 European languages, missing alignments for other 190 language pairs. A previous method using sentence length information is not enough reliable to produce alignments for training statistical machine translation models. Hybrid methods combining sentence length and bilingual dictionary information may produce better results, but dictionaries may not be affordable. Thus, we introduce a technique which aligns non-English corpora from the European Parliament by using English as a pivot language without a bilingual dictionary. Our technique has been illustrated with French and Spanish, resulting on an equivalent performance with the existing one in the original English-French and English-Spanish corpora.
机译:从统计学上训练机器翻译模型需要并行的语料库,其中包含大量两种语言的对齐句子对。但是,如果英语不是源语言或目标语言,要获得这样的语料库并不容易。欧洲议会平行语料库仅包含20种欧洲语言的英语句子对齐方式,而其他190种语言对则缺少对齐方式。使用句子长度信息的先前方法不够可靠,无法产生用于训练统计机器翻译模型的对齐方式。将句子长度和双语词典信息相结合的混合方法可能会产生更好的结果,但是词典可能负担不起。因此,我们引入了一种技术,该技术通过使用英语作为主要语言而不使用双语词典来调整欧洲议会的非英语语料库。我们的技术已经用法语和西班牙语进行了说明,其性能与原始英语-法语和英语-西班牙语语料库中的现有技术相当。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号