【24h】

Tapadoir

机译:塔帕多尔

获取原文
获取原文并翻译 | 示例

摘要

Tapadoir (from the Irish "tapa" - fast) is a statistical machine translation project which has just completed its pilot phase. The heart of the project is the development of an English-Irish translation system, intended for integration into the workflow of a professional translator at an Irish government department. In practice, this means statistical machine translation from a highly-resourced majority language (English) to an under-resourced minority language (Irish) with significant linguistic differences. A secondary aim is the production of English-Irish parallel corpora suitable for future translation tool and NLP developers. There is high demand for Irish-language translated texts within Irish government departments, and this MT integration aims to increase the speed of translation to meet this demand. Tapadoir currently out-performs (based on BLEU score) Google Translate on data from our use case domain (official government documents and reports). The official European Commission machine translation service, MT@EC, rate their English-to-Irish MT system as suitable for gist translation, but below useful editable quality, the standard required by the client. While MT@EC also build custom pilot projects based on existing user data, the client's data is limited. Therefore, further data collection constitutes a large proportion this project's remit. English-to-Irish translation holds a number of challenges. From an NLP perspective, Irish is very much under-resourced, and much of the project so far has focused on corpus development. The target language is also morphologically much richer than the source (e.g. initial mutations, synthetic verb forms, case), and the resulting data sparsity further compounds the these translation challenges. Linguistically, the language pair word order is divergent (Subject-Verb-Object vs. Verb-Subject-Object), with other word order differences at lower levels, such as adjectives following nouns, and the genitive noun following its possessed object in Irish. To cope with this, we are currently developing source-side reordering rules to address word-order divergence, and we are exploring ways to overcome the morphological discrepancies. Our aim is to use various methods to provide useful machine translation output for an unusual and challenging language pair. Rather than aiming to investigate the general effectiveness of particular methods, we are attempting to find the best practical combination for this resource-poor and linguistically challenging use-case. We expect that our work will be of use to developers of MT systems for other under-resourced languages. The Tapadoir MT engine will be deployed for in-house use by the Irish Department of Arts, Heritage and the Gaeltacht. However, we hope to make freely available the resources gathered/created over the course of its development, for the sake of future Irish-language projects.
机译:Tapadoir(来自爱尔兰的“ tapa”-快速)是一个统计机器翻译项目,刚刚完成其试验阶段。该项目的核心是开发英语-爱尔兰语翻译系统,旨在将其集成到爱尔兰政府部门的专业翻译人员的工作流程中。实际上,这意味着将统计机器从资源丰富的多数语言(英语)翻译为资源不足的少数族裔语言(爱尔兰语),而这些语言之间存在显着的语言差异。第二个目标是生产适用于未来翻译工具和NLP开发人员的英语-爱尔兰平行语料库。爱尔兰政府部门内部对爱尔兰语翻译文本的需求很高,这种MT集成旨在提高翻译速度,以满足这种需求。 Tapadoir目前的性能(基于BLEU得分)优于Google翻译服务(来自政府用例领域)(官方政府文件和报告)。欧盟委员会的官方机器翻译服务MT @ EC对其英语到爱尔兰的MT系统进行了评分,认为它们适合要点翻译,但是质量低于客户要求的可编辑质量。尽管MT @ EC还基于现有用户数据构建自定义试点项目,但客户的数据却受到限制。因此,进一步的数据收集在该项目的职责中占很大比例。英语到爱尔兰语的翻译面临许多挑战。从NLP的角度来看,爱尔兰的资源非常匮乏,到目前为止,该项目的大部分都集中在语料库开发上。目标语言在形态上也比源语言丰富得多(例如,初始突变,合成动词形式,大小写),并且由此产生的数据稀疏性进一步加剧了这些翻译挑战。从语言学上讲,语言对的词序是不同的(主语-动词宾语与动词-主语宾语),而其他的语序差异则较低,例如在爱尔兰语中形容词后面的形容词和在其所有语后面的属格名词。为了解决这个问题,我们目前正在开发源代码侧重排规则以解决词序差异,并且我们正在探索克服形态差异的方法。我们的目标是使用各种方法为不寻常且具有挑战性的语言对提供有用的机器翻译输出。与其着眼于研究特定方法的一般有效性,不如尝试针对这种资源匮乏和语言挑战性的用例找到最佳的实用组合。我们希望我们的工作将对其他资源不足的语言的MT系统开发人员有所帮助。 Tapadoir MT发动机将由爱尔兰艺术,遗产和盖尔塔赫特局自行部署。但是,为了将来的爱尔兰语项目,我们希望免费提供在其开发过程中收集/创建的资源。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号