首页> 外文会议>IAPR International Conference on Document Analysis and Recognition >Segmentation-Free Speech Text Recognition for Comic Books
【24h】

Segmentation-Free Speech Text Recognition for Comic Books

机译:对漫画书的分割语音文本识别

获取原文
获取外文期刊封面目录资料

摘要

Speech text in comic books is written in a particular manner by the scriptwriter which raises unusual challenges for text recognition. We first detail these challenges and present different approaches to solve them. We compare the performances of pre-trained OCR and segmentation-free approach for speech text of comic books written in Latin script. We demonstrate that few good quality pre-trained OCR output samples, associated with other unlabeled data with the same writing style, can feed a segmentation-free OCR and improve text recognition. Thanks to the help of the lexicality measure that automatically accept or reject the pretrained OCR output as pseudo ground truth for a subsequent segmentation-free OCR training and recognition.
机译:漫画书中的语音文本由Scriptwriter以特定方式编写,这对文本识别提出了不寻常的挑战。我们首先详细调查这些挑战并呈现不同的方法来解决它们。我们比较了在拉丁文脚本中编写的漫画书籍语言文本的预训练OCR和分割方法的表演。我们展示了与具有相同写入风格的其他未标记的数据相关的少数好的质量训练有素的OCR输出样本,可以提供免费的OCR并提高文本识别。由于对词汇量措施的帮助,自动接受或拒绝预磨削的OCR输出作为伪基地的真实性,以获得后续分割的OCR培训和识别。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号