首页> 外文会议>International Conference on Document Analysis and Recognition >Character N-Gram Spotting on Handwritten Documents Using Weakly-Supervised Segmentation
【24h】

Character N-Gram Spotting on Handwritten Documents Using Weakly-Supervised Segmentation

机译:使用虚弱的分割的手写文件上的字符n-gram探测

获取原文

摘要

In this paper, we present a solution towards building a retrieval system over handwritten document images that i) is recognition-free, ii) allows text-querying, iii) can retrieve at sub-word level, iv) can search for out-of-vocabulary words. Unlike previous approaches that operate at either character or word levels, we use character n-gram images (CNG-img) as the retrieval primitive. CNG-img are sequences of character segments, that are represented and matched in the image-space. The word-images are now treated as a bag-of-CNG-img, that can be indexed and matched in the feature space. This allows for recognition-free search (query-by-example), which can retrieve morphologically similar words that have matching sub-words. Further, to enable query-by-keyword, we build an automated scheme to generate labeled exemplars for characters and character n-grams, from unconstrained handwritten documents. We pose this problem as one of weakly-supervised learning, where character/n-gram labeling is obtained automatically from the word labels. The resulting retrieval system can answer queries from an unlimited. vocabulary. The approach is demonstrated on the George Washington collection, results show major improvement in retrieval performance as compared to word-recognition and word-spotting methods.
机译:在本文中,我们提出了努力建设在手写原稿图像的检索系统,i)为识别 - 免费的解决方案,II)允许文本查询,III)可以在子字级检索,IV)可以搜索出的-词汇。不像在任字符或单词级别操作以前的方法中,我们使用的字符的n-gram的图像(CNG-IMG)作为检索原语。 CNG-IMG是字符段的序列,被表示并在图像空间匹配。字图像现在都被视为一个袋的-CNG-IMG,可被索引并在功能空间相匹配。这允许自由识别搜索(查询通过例子),它可以检索具有匹配分词形态相似的词。此外,为了能够查询通过关键字,我们建立一个自动化方案来产生字符和正克标记典范,从自由手写文件。我们提出这个问题,因为弱监督学习,其中从字标签自动获得的角色/ n元的标签之一。得到的检索系统能够回答无限查询。词汇。该方法是表现出对乔治·华盛顿集合,相比于文字识别和文字去斑方法,结果显示,检索性能重大改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号