首页> 外文期刊>Knowledge-Based Systems >Density based semi-automatic labeling on multi-feature representations for ground truth generation: Application to handwritten character recognition
【24h】

Density based semi-automatic labeling on multi-feature representations for ground truth generation: Application to handwritten character recognition

机译:基于密度的半自动标记对地面真理生成的多特征表示:应用于手写字符识别的应用

获取原文
获取原文并翻译 | 示例

摘要

A huge number of labeled samples are required as training data to construct an efficient recognition mechanism for an optical character recognition system. Although samples of characters can be easily collected from available manuscripts, they often lack class labels, especially for ancient and local alphabets. The creation of a training dataset requires a great number of characters manually annotated by experts. It is a costly and time-consuming process. To considerably reduce the human effort required in the construction of training datasets, a novel semi-automatic labeling method is proposed in this work under the assumption that there are no initial labeled samples. The proposed method performs an iterative procedure on a nearest neighbor graph that views samples in multiple feature spaces. In each iteration, an expert is first called upon to label a relevant unlabeled sample that is automatically selected from the highest density area of unlabeled samples. Then, the manually annotated label is propagated to the neighbor samples with safe conditions based on sample density and multi-views. The procedure is repeated until all unlabeled samples are labeled. The labeling procedure of the proposed method is evaluated on MNIST, Devanagari, Thai, and Lanna Dhamma datasets. The results show that the proposed method outperforms state-of-the-art labeling methods, achieving the highest labeling accuracy. In addition, it can handle outlier samples and deal with alphabets that include visually similar characters. Moreover, the recognition performance of the classifier trained by using the semiautomatically generated training dataset is comparable with that classifier trained by actual ground truth. (c) 2021 Elsevier B.V. All rights reserved.commentSuperscript/Subscript Available/comment
机译:需要大量标记的样本作为训练数据,以构建光学字符识别系统的高效识别机制。虽然可以从可用的手稿容易地收集字符样本,但它们通常缺乏类标签,特别是对于古老和本地字母。创建培训数据集需要专家手动注释的大量字符。这是一种昂贵且耗时的过程。为了大大减少建造训练数据集所需的人力努力,在这项工作中提出了一种新的半自动标记方法,假设没有初始标记的样品。所提出的方法在最近的邻居图上执行迭代过程,其在多个特征空间中的样本视图中的样本。在每次迭代中,首先要求专家标记从未标记样本的最高密度区域自动选择的相关的未标记样本。然后,将手动注释的标签与基于样本密度和多视图的安全条件一起传播到邻居样本。重复该过程,直到所有未标记的样本标记。所提出的方法的标记程序在Mnist,Devanagari,Thai和Lanna Dhamma数据集中进行评估。结果表明,该方法优于最先进的标签方法,实现了最高标记精度。此外,它还可以处理异常样本并处理包含视觉上类似字符的字母表。此外,通过使用半仿制训练数据集训练的分级器的识别性能与由实际地面真理训练的分类器相当。 (c)2021 elestvier b.v.保留所有权利。&注释&可用的上标/下标& /评论

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号