首页> 外文会议>Digital Libraries: Universal and Ubiquitous Access to Information >Language Independent Word Spotting in Scanned Documents
【24h】

Language Independent Word Spotting in Scanned Documents

机译:扫描文档中与语言无关的单词识别

获取原文
获取原文并翻译 | 示例

摘要

Large quantities of scanned handwritten and printed documents are rapidly being made available for use by information storage and retrieval systems, such as for use by libraries. We present the design and performance of a language independent system for spotting handwritten/printed words in scanned document images. The technique is evaluated with three scripts: Devanagari (Sanskrit/Hindi), Arabic (Arabic/Urdu) and Latin (English). Three main components of the system are a word segmenter, a shape based matcher for words, and a search interface. The user gives a query which can be (ⅰ) A word image (to spot similar words from a collection of documents written in that script) or (ⅱ) text (to look for the equivalent word images in the script). The candidate words that are searched in the documents are retrieved and ranked, where the ranking criterion is a similarity score between the query and the candidate words based on global word shape features. For handwritten English, a precision of 60% was obtained at a recall of 50%. An alternate approach comprising of prototype selection and word matching, that yields a better performance for handwritten documents is also discussed. For printed Sanskrit documents, a precision as high as 90% was obtained at a recall of 50%.
机译:大量扫描的手写和打印文档正迅速可供信息存储和检索系统使用,例如供图书馆使用。我们提出了一种独立于语言的系统的设计和性能,该系统可用于在扫描的文档图像中发现手写/打印的单词。使用三种脚本评估了该技术:梵文(梵文/印地文),阿拉伯文(阿拉伯文/乌尔都文)和拉丁文(英文)。该系统的三个主要组件是分词器,基于形状的单词匹配器和搜索界面。用户给出的查询可以是(ⅰ)单词图像(从该脚本编写的文档集中发现相似的单词)或(ⅱ)文本(在脚本中查找等效的单词图像)。在文档中搜索的候选单词被检索并排序,其中排名标准是基于全局单词形状特征的查询和候选单词之间的相似度得分。对于手写英语,召回率为50%时,精度为60%。还讨论了一种替代方法,其中包括原型选择和单词匹配,可以为手写文档带来更好的性能。对于打印的梵文文档,召回率为50%时,可以达到90%的精度。

著录项

  • 来源
  • 会议地点 Bali(ID);Bali(ID)
  • 作者单位

    Center of Excellence for Document Analysis and Recognition (CEDAR) Department of Computer Science and Engineering University at Buffalo, The State University of New York Buffalo, New York 14228, USA;

    Center of Excellence for Document Analysis and Recognition (CEDAR) Department of Computer Science and Engineering University at Buffalo, The State University of New York Buffalo, New York 14228, USA;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 计算机网络;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号