首页> 外文会议>International conference on recent advances in natural language processing >From the Paft to the Fiiture: a Fully Automatic NMT and Word Embeddings Method for OCR Post-Correction
【24h】

From the Paft to the Fiiture: a Fully Automatic NMT and Word Embeddings Method for OCR Post-Correction

机译:从Paft到Fiiture:用于OCR后改正的全自动NMT和单词嵌入方法

获取原文

摘要

A great deal of historical corpora suffer from errors introduced by the OCR (optical character recognition) methods used in the digitization process. Correcting these errors manually is a time-consuming process and a great part of the automatic approaches have been relying on rules or supervised machine learning. We present a fully automatic unsupervised way of extracting parallel data for training a character-based sequence-to-sequence NMT (neural machine translation) model to conduct OCR error correction.
机译:许多历史语料库都遭受了数字化过程中使用的OCR(光学字符识别)方法引入的错误。手动纠正​​这些错误是一个耗时的过程,并且大部分自动方法一直依赖于规则或受监督的机器学习。我们提出了一种提取训练数据的全自动无监督方式,以训练基于字符的序列到序列NMT(神经机器翻译)模型来进行OCR纠错。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号