首页> 外文会议>LREC-2012 >An English-Portuguese parallel corpus of questions: translation guidelines and application in Statistical Machine Translation
【24h】

An English-Portuguese parallel corpus of questions: translation guidelines and application in Statistical Machine Translation

机译:一个英语 - 葡萄牙语并行的问题语料库:翻译准则和统计机器翻译中的应用

获取原文

摘要

The task of Statistical Machine Translation depends on large amounts of training corpora. Despite the availability of several parallel corpora, these are-typically composed of declarative sentences, which may not be appropriate when the goal is to translate other types of sentences, e.g., interrogatives. There have been efforts to create corpora of questions, specially in the context of the evaluation of Question-Answering systems. One of those corpora is the UIUC dataset, composed of nearly 6,000 questions, widely used in the task of Question Classification. In this work, we make available the Portuguese version of the UIUC dataset, which we manually translated, as well as the translation guidelines. We show the impact of this corpus in the performance of a state-of-the-art SMT system when translating questions. Finally, we present a taxonomy of translation errors, according to which we analyze the output of the automatic translation before and after using the corpus as training data.
机译:统计机器翻译的任务取决于大量培训。尽管有多个平行的Corpora,这些通常由声明性句子组成,但是当目标是翻译其他类型的句子时,可能不适合,例如疑问。有努力创建问题的Corpora,特别是在评估问答系统的评估的背景下。其中一项公司是UIUC数据集,由近6,000个问题组成,广泛用于问题分类任务。在这项工作中,我们可以提供我们手动翻译的UIUC数据集的葡萄牙版本,以及翻译指南。在翻译问题时,我们展示了这种语料库的影响在最先进的SMT系统。最后,我们提出了一种翻译错误的分类,根据其中,通过将语料库作为训练数据分析了自动翻译的输出。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号