首页> 外文期刊>Procedia Computer Science >A Hybrid of Sentence-Level Approach and Fragment-Level Approach of Parallel Text Extraction from Comparable Text
【24h】

A Hybrid of Sentence-Level Approach and Fragment-Level Approach of Parallel Text Extraction from Comparable Text

机译:从可比文本中提取句子水平方法和片段水平方法的并行文本

获取原文
       

摘要

Parallel texts are essential resources in linguistics, natural language processing, and multilingual information retrieval. Many studies attempt to extract parallel text from existing resources, particularly from comparable texts. The approaches to extract parallel text from comparable text can be divided into sentence-level approach and fragment-level approach. In this paper, an approach that combines sentence-level approach and fragment-level approach is proposed. The study was evaluated using statistical machine translation (SMT) and neural machine translation (NMT). The experiment results show a very significant improvement in the BLEU scores of SMT and NMT. The BLEU scores for SMT for the test in computer science domain and news domain increase from 17.45 and 41.45 to 18.56 and 48.65 respectively. On the other hand, the BLEU scores for NMT in the computer science domain and news domain increase from 14.42 and 19.39 to 21.17 and 41.75 respectively.
机译:平行文本是语言学,自然语言处理和多语言信息检索中必不可少的资源。许多研究试图从现有资源中,特别是从可比较的文本中提取平行文本。从可比较文本中提取平行文本的方法可以分为句子级方法和片段级方法。本文提出了一种将句子级方法和片段级方法相结合的方法。使用统计机器翻译(SMT)和神经机器翻译(NMT)对研究进行了评估。实验结果表明,SMT和NMT的BLEU得分有非常显着的提高。 SMT的BLEU分数在计算机科学领域和新闻领域的测试分别从17.45和41.45分别提高到18.56和48.65。另一方面,NMT在计算机科学领域和新闻领域的BLEU分数分别从14.42和19.39分别提高到21.17和41.75。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号