首页> 外文会议>Speech Technology and Human-Computer Dialogue, 2009. SpeD '09 >Factored phrase-based statistical machine translation
【24h】

Factored phrase-based statistical machine translation

机译:基于因式短语的统计机器翻译

获取原文

摘要

We describe the results of a short-term SEE-ERAnet project the aim of which was to investigate the feasibility of machine translation (MT) research and development for several South Slavic and Balkan languages. The major tasks of the project were: compilation of a multilingual parallel corpus for the concerned languages, the XML mark-up of the corpus (tokenization, lemmatization, tagging), the sentence and word alignment of the corpus and the building of the statistical translation models. Additionally, based on the created resources and models, we conducted preliminary experiments on building prototype MT systems for Romanian <-> English, Greek <-> English and Slovene <-> English. We argue that by investing efforts in building accurate language resources, larger the better, as well as in fine-tuning of the statistical parameters, the current machine-learning technologies can be successfully used for a quick development of acceptable MT prototypes, valuable starting points in implementing working systems. We substantiate this claim with recent results from a follow-up national project, aiming at the development of a Romanian<->English translation system.
机译:我们描述了一个短期SEE-ERAnet项目的结果,该项目的目的是调查几种南斯拉夫语和巴尔干语的机器翻译(MT)研究和开发的可行性。该项目的主要任务是:为相关语言编写多语言并行语料库,语料库的XML标记(标记化,词义化,标记),语料库的句子和单词对齐以及统计翻译的构建楷模。另外,基于创建的资源和模型,我们进行了初步实验,以构建罗马尼亚语<->英语,希腊语<->英语和斯洛文尼亚<->英语的原型MT系统。我们认为,通过投入精力来构建准确的语言资源,变大更好,以及对统计参数进行微调,当前的机器学习技术可以成功地用于快速开发可接受的MT原型,这是有价值的起点在执行工作系统中。我们通过后续国家项目的最新结果来证明这一主张,该项目旨在发展罗马尼亚语-英语翻译系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号