【24h】

Spotting Words in Handwritten Arabic Documents

机译:在手写阿拉伯语文档中发现单词

获取原文
获取原文并翻译 | 示例

摘要

The design and performance of a system for spotting handwritten Arabic words in scanned document images is presented. Three main components of the system are a word segmenter, a shape based matcher for words and a search interface. The user types in a query in English within a search window, the system finds the equivalent Arabic word, e.g., by dictionary look-up, locates word images in an indexed (segmented) set of documents. A two-step approach is employed in performing the search: (1) prototype selection: the query is used to obtain a set of handwritten samples of that word from a known set of writers (these are the prototypes), and (2) word matching: the prototypes are used to spot each occurrence of those words in the indexed document database. A ranking is performed on the entire set of test word images- where the ranking criterion is a similarity score between each prototype word and the candidate words based on global word shape features. A database of 20,000 word images contained in 100 scanned handwritten Arabic documents written by 10 different writers was used to study retrieval performance. Using five writers for providing prototypes and the other five for testing, using manually segmented documents, 55% precision is obtained at 50% recall. Performance increases as more writers are used for training.
机译:介绍了在扫描的文档图像中识别手写阿拉伯语单词的系统的设计和性能。该系统的三个主要组件是分词器,基于形状的单词匹配器和搜索界面。用户在搜索窗口中以英语键入查询,系统例如通过字典查找找到等效的阿拉伯语单词,在索引的(分段的)文档集中找到单词图像。执行搜索时采用两步方法:(1)原型选择:查询用于从一组已知的作者(这些原型)中获得该单词的一组手写样本,以及(2)单词匹配:原型用于发现索引文档数据库中这些单词的每次出现。对测试单词图像的整个集合进行排名-排名标准是基于整体单词形状特征的每个原型单词和候选单词之间的相似性评分。由10个不同的作者撰写的100个扫描的阿拉伯手写文档中包含20,000个单词图像的数据库,用于研究检索性能。使用五位编写者提供原型,其他五位编写者进行测试,使用手动分段的文档,则在50%的召回率下可获得55%的精度。随着越来越多的作家用于培训,性能得到提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号