【24h】

Enriching Character-Based Neural Machine Translation with Modern Documents for Achieving an Orthography Consistency in Historical Documents

机译:利用现代文献丰富基于字符的神经机器翻译,以实现历史文献中拼字法的一致性

获取原文

摘要

The nature of human language and the lack of a spelling convention make historical documents hard to handle for natural language processing. Spelling normalization tackles this problem by adapting their spelling to modern standards in order to get an orthography consistency. In this work, we compare several character-based machine translation approaches, and propose a method to profit from modern documents to enrich neural machine translation models. We tested our proposal with four different data sets, and observed that the enriched models successfully improved the normalization quality of the neural models. Statistical models, however, yielded a better result.
机译:人类语言的性质以及缺乏拼写约定的原因使得历史文献难以进行自然语言处理。拼写规范化通过使拼写符合现代标准来解决该问题,从而获得拼字法的一致性。在这项工作中,我们比较了几种基于字符的机器翻译方法,并提出了一种从现代文档中获利的方法,以丰富神经机器翻译模型。我们用四个不同的数据集测试了我们的建议,并观察到丰富的模型成功地提高了神经模型的标准化质量。然而,统计模型产生了更好的结果。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号