Character N-Gram Spotting on Handwritten Documents Using Weakly-Supervised Segmentation

机译：使用虚弱的分割的手写文件上的字符n-gram探测

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we present a solution towards building a retrieval system over handwritten document images that i) is recognition-free, ii) allows text-querying, iii) can retrieve at sub-word level, iv) can search for out-of-vocabulary words. Unlike previous approaches that operate at either character or word levels, we use character n-gram images (CNG-img) as the retrieval primitive. CNG-img are sequences of character segments, that are represented and matched in the image-space. The word-images are now treated as a bag-of-CNG-img, that can be indexed and matched in the feature space. This allows for recognition-free search (query-by-example), which can retrieve morphologically similar words that have matching sub-words. Further, to enable query-by-keyword, we build an automated scheme to generate labeled exemplars for characters and character n-grams, from unconstrained handwritten documents. We pose this problem as one of weakly-supervised learning, where character/n-gram labeling is obtained automatically from the word labels. The resulting retrieval system can answer queries from an unlimited. vocabulary. The approach is demonstrated on the George Washington collection, results show major improvement in retrieval performance as compared to word-recognition and word-spotting methods.

机译：在本文中，我们提出了努力建设在手写原稿图像的检索系统，i）为识别 - 免费的解决方案，II）允许文本查询，III）可以在子字级检索，IV）可以搜索出的-词汇。不像在任字符或单词级别操作以前的方法中，我们使用的字符的n-gram的图像（CNG-IMG）作为检索原语。 CNG-IMG是字符段的序列，被表示并在图像空间匹配。字图像现在都被视为一个袋的-CNG-IMG，可被索引并在功能空间相匹配。这允许自由识别搜索（查询通过例子），它可以检索具有匹配分词形态相似的词。此外，为了能够查询通过关键字，我们建立一个自动化方案来产生字符和正克标记典范，从自由手写文件。我们提出这个问题，因为弱监督学习，其中从字标签自动获得的角色/ n元的标签之一。得到的检索系统能够回答无限查询。词汇。该方法是表现出对乔治·华盛顿集合，相比于文字识别和文字去斑方法，结果显示，检索性能重大改进。

著录项

来源
《International Conference on Document Analysis and Recognition》|2013年||共5页
会议地点
作者
Roy Udit; Sankaran Naveen; Sankar K.Pramod; Jawahar C.V.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP391.41;
关键词

相似文献

外文文献
中文文献
专利

1. A Novel Approach for Character Segmentation of Offline Handwritten Marathi Documents written in MODI Script [J] . Parag A. Tamhankar, Krishnat D. Masalkar, Satish R. kolhe Procedia Computer Science . 2020,第5期

机译：Modi脚本中写的离线手写Marathi文档的一种新方法
2. Word Extraction and Character Segmentation from Text Lines of Unconstrained Handwritten Bangla Document Images [J] . Ram Sarkar, Samir Malakar, Nibaran Das, Journal of Intelligent Systems . 2011,第3期

机译：从不受约束的手写孟加拉语文档图像的文本行中提取单词并进行字符分割
3. Character con?dence based on N-best list for keyword spotting in online Chinese handwritten documents [J] . Heng Zhang, Da-Han Wang, Cheng-Lin Liu Pattern Recognition: The Journal of the Pattern Recognition Society . 2014,第5期

机译：基于N-最佳列表的字符置信度用于在线中文手写文档中的关键词识别
4. Character N-Gram Spotting on Handwritten Documents Using Weakly-Supervised Segmentation [C] . Roy Udit, Sankaran Naveen, Sankar K.Pramod, International Conference on Document Analysis and Recognition . 2013

机译：使用弱监督分割在手写文档上进行字符N-Gram识别
5. Document image analysis techniques for handwritten text segmentation, document image rectification and digital collation. [D] . Salvi, Dhaval. 2014

机译：用于手写文本分割，文档图像校正和数字整理的文档图像分析技术。
6. Weakly-supervised convolutional neural networks of renal tumor segmentation in abdominal CTA images [O] . Guanyu Yang, Chuanxia Wang, Jian Yang, 2020

机译：腹部CTA图像中肾肿瘤分割的弱监督卷积神经网络
7. Character N-Gram Spotting on Handwritten Documents using Weakly-Supervised Segmentation [O] . Udit Roy, Naveen Sankaran, Pramod Sankar K, 2015

机译：使用弱监督分割对手写文档进行字符N-Gram定位

Character N-Gram Spotting on Handwritten Documents Using Weakly-Supervised Segmentation

摘要

著录项

相似文献

相关主题

期刊订阅