From the Paft to the Fiiture: a Fully Automatic NMT and Word Embeddings Method for OCR Post-Correction

机译：从Paft到Fiiture：用于OCR后改正的全自动NMT和单词嵌入方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

A great deal of historical corpora suffer from errors introduced by the OCR (optical character recognition) methods used in the digitization process. Correcting these errors manually is a time-consuming process and a great part of the automatic approaches have been relying on rules or supervised machine learning. We present a fully automatic unsupervised way of extracting parallel data for training a character-based sequence-to-sequence NMT (neural machine translation) model to conduct OCR error correction.

机译：许多历史语料库都遭受了数字化过程中使用的OCR（光学字符识别）方法引入的错误。手动纠正这些错误是一个耗时的过程，并且大部分自动方法一直依赖于规则或受监督的机器学习。我们提出了一种提取训练数据的全自动无监督方式，以训练基于字符的序列到序列NMT（神经机器翻译）模型来进行OCR纠错。

著录项

来源
《International conference on recent advances in natural language processing》|2019年|431-436|共6页
会议地点
作者
Mika Haemaelaeinen; Simon Hengchen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Toward the optimized crowdsourcing strategy for OCR post-correction [J] . Omri Suissa, Avshalom Elmalech, Maayan Zhitomirsky-Geffet Aslib Proceedings . 2020,第2期

机译：对OCR后纠正的优化众包策略
2. Stroop effects on redemption and semantic effects on confession: simultaneous automatic activation of embedded and carrier words [J] . Cristina Iani, Remo Job, Roberto Padovani, Cognitive processing . 2009,第4期

机译：Stroop对赎回的影响和对认罪的语义影响：嵌入词和载体词同时自动激活
3. Stroop effects on redemption and semantic effects on confession: simultaneous automatic activation of embedded and carrier words [J] . Cristina Iani, Remo Job, Roberto Padovani, Cognitive Processing . 2009,第4期

机译：Stroop对赎回的影响和对认罪的语义影响：嵌入词和载体词同时自动激活
4. From the Paft to the Fiiture: a Fully Automatic NMT and Word Embeddings Method for OCR Post-Correction [C] . Mika Haemaelaeinen, Simon Hengchen International conference on recent advances in natural language processing . 2019

机译：从点对手到Fiiture：一个全自动NMT和Word Embeddings方法，用于OCR后校正
5. A multimodal fusion approach for automatic postal address recognition system using Optical Character Recognition (OCR) and Automatic Speech Recognition (ASR) techniques. [D] . Singh, Amriteshwar. 2011

机译：一种使用光学字符识别（OCR）和自动语音识别（ASR）技术的自动邮政地址识别系统的多模式融合方法。
6. Automatic query generation using word embeddings for retrieving passages describing experimental methods [O] . Ferhat Aydın, Zehra Melce Hüsünbeyi, Arzucan Özgür 2017

机译：使用单词嵌入自动查询生成以检索描述实验方法的段落
7. From the Paft to the Fiiture: a Fully Automatic NMT andWord Embeddings Method for OCR Post-Correction [O] . Mika Hämäläinen, Simon Hengchen 2019

机译：从点对手到Fiiture：一个全自动的NMT和Wind Embeddings方法，用于OCR后校正

From the Paft to the Fiiture: a Fully Automatic NMT and Word Embeddings Method for OCR Post-Correction

摘要

著录项

相似文献

相关主题

期刊订阅