首页> 外文会议>International Conference on Frontiers in Handwriting Recognition >Two Semi-Supervised Training Approaches for Automated Text Recognition

【24h】

Two Semi-Supervised Training Approaches for Automated Text Recognition

机译：两种自动文本识别的半监督训练方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Automated text recognition is a fundamental problem in Document Image Analysis. Optical models are used for modeling characters while language models are used for composing sentences. Since the scripts and linguistic context differ widely, it is mandatory to specialize the models by training on task-dependent ground-truth. However, to create a sufficient amount of ground-truth, at least for historical handwritten scripts, well-qualified persons have to mark and transcribe text lines, which is very time-consuming. On the other hand, in many cases unassigned transcripts are already available on page-level from another process chain, or at least transcripts from similar linguistic context are available. In this work we present two approaches that make use of such transcripts: whereas the first one creates training data by automatically assigning page-dependent transcripts to text lines, the second one uses a task-specific language model to generate highly confident training data. Both approaches are successfully applied on a very challenging historical handwritten collection.

机译：文本自动识别是“文档图像分析”中的一个基本问题。光学模型用于建模字符，而语言模型用于构成句子。由于脚本和语言环境差异很大，因此必须通过对与任务相关的事实进行培训来对模型进行专业化处理。但是，至少要为历史手写脚本创建足够多的依据，合格的人员必须标记和抄写文本行，这非常耗时。另一方面，在许多情况下，未分配的成绩单已经在另一个处理链的页面级别上可用，或者至少来自相似语言环境的成绩单是可用的。在这项工作中，我们提出了两种利用此类成绩单的方法：第一种方法是通过自动将与页面相关的成绩单分配给文本行来创建训练数据，而第二种方法则使用特定于任务的语言模型来生成高度自信的训练数据。两种方法都成功地应用于极具挑战性的历史手写收藏中。

著录项

来源
《International Conference on Frontiers in Handwriting Recognition 》|2020年|145-150|共6页
会议地点
作者
Gundram Leifert; Roger Labahn; Joan Andreu Sánchez;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Training data; Text recognition; Layout; Task analysis; Production; Linguistics;

机译：培训;培训数据;文本识别;布局;任务分析;生产;语言学;

相似文献

外文文献
中文文献
专利

1. Drug name recognition and classification in biomedical texts A case study outlining approaches underpinning automated systems. [J] . Segura-Bedmar I, Martinez P, Segura-Bedmar M Drug discovery today . 2008 ,第17a18期

机译：生物医学文献中的药物名称识别和分类案例研究概述了自动化系统的基础方法。
2. TwiSNER: Semi-supervised Method for Named Entity Recognition from Text Streams on Twitter [J] . Van Cuong Tran, Dosam Hwang, Jason J. Jung Journal of Universal Computer Science . 2016 ,第6期

机译：TwiSNER：从Twitter上的文本流中识别实体的半监督方法
3. Semi-supervised emotion recognition in textual conversation via a context-augmented auxiliary training task [J] . Liangyi Kang, Jie Liu, Lingqiao Liu, Information Processing & Management . 2021 ,第6期

机译：通过上下文辅助培训任务，半监督文本对话中的情感认可
4. Comparison of Deep Co-Training and Mean-Teacher Approaches for Semi-Supervised Audio Tagging [C] . Léo Cances, Thomas Pellegrini IEEE International Conference on Acoustics, Speech and Signal Processing . 2021

机译：半监督音频标记的深度共同训练和叶面教师方法的比较
5. Semi-Supervised Training for Automatic Speech Recognition [D] . Manohar, Vimal. 2019

机译：半监督自动演讲识别培训
6. Robust Semi-Supervised Traffic Sign Recognition via Self-Training and Weakly-Supervised Learning [O] . Obed Tettey Nartey, Guowu Yang, Sarpong Kwadwo Asare, 2020

机译：通过自我训练和弱监督学习实现可靠的半监督交通标志识别
7. Human or Computer Assisted Interactive Transcription: Automated Text Recognition, Text Annotation, and Scholarly Edition in the Twenty-First Century [O] . Castro-Bleda M. J. 2014

机译：人机交互或计算机辅助交互转录：二十一世纪的自动文本识别，文本注释和学术版

Two Semi-Supervised Training Approaches for Automated Text Recognition

摘要

著录项

相似文献

相关主题

期刊订阅