首页> 外文会议>Tenth workshop on building and using comparable corpora 2017 >BUCC2017: A Hybrid Approach for Identifying Parallel Sentences in Comparable Corpora
【24h】

BUCC2017: A Hybrid Approach for Identifying Parallel Sentences in Comparable Corpora

机译:BUCC2017:识别可比语料库中并行句子的混合方法

获取原文
获取原文并翻译 | 示例

摘要

A Statistical Machine Translation (SMT) system is always trained using large parallel corpus to produce effective translation. Not only is the corpus scarce, it also involves a lot of manual labor and cost. Parallel corpus can be prepared by employing comparable corpora where a pair of corpora is in two different languages pointing to the same domain. In the present work, we try to build a parallel corpus for French-English language pair from a given comparable corpus. The data and the problem set are provided as part of the shared task organized by BUCC 2017. We have proposed a system that first translates the sentences by heavily relying on Moses and then group the sentences based on sentence length similarity. Finally, the one to one sentence selection was done based on Cosine Similarity algorithm.
机译:始终使用大型并行语料库训练统计机器翻译(SMT)系统,以产生有效的翻译。语料库不仅稀缺,还涉及大量的体力劳动和成本。可以通过使用可比较的语料库来准备平行语料库,其中一对语料库以两种不同的语言指向相同的域。在当前的工作中,我们尝试从给定的可比语料库为法语-英语对构建一个并行语料库。数据和问题集是BUCC 2017组织的共享任务的一部分。我们提出了一种系统,该系统首先通过严重依赖Moses来翻译句子,然后根据句子长度相似性对句子进行分组。最后,基于余弦相似度算法进行一对一的句子选择。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号