首页> 外文会议>International Conference on Document Analysis and Recognition >A New Framework for Recognition of Heavily Degraded Characters in Historical Typewritten Documents Based on Semi-Supervised Clustering
【24h】

A New Framework for Recognition of Heavily Degraded Characters in Historical Typewritten Documents Based on Semi-Supervised Clustering

机译:基于半监督群集的历史打字文档中严重降级字符的新框架

获取原文

摘要

This paper presents a new semi-supervised clustering framework to the recognition of heavily degraded characters in historical typewritten documents, where off-the-shelf OCR typically fails. The constraints are generated using typographical (collection-independent) domain knowledge and are used to guide both sample (glyph set) partitioning and metric learning. Experimental results using simple features provide encouraging evidence that this approach can lead to significantly improved clustering results compared to simple K-Means clustering, as well as to clustering using a state-of-the art OCR engine.
机译:本文介绍了一个新的半监督聚类框架,以识别历史打字文档中的严重退化字符,其中搁置OCR通常会失败。使用印刷(Collection-Informical)域知识生成约束,用于指导样本(字形集)分区和度量学习。使用简单特征的实验结果提供令人鼓舞的证据表明,与简单的K-Means聚类,以及使用最先进的OCR引擎进行聚类,这种方法可以显着改善聚类结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号