Spotting Words in Handwritten Arabic Documents

机译：在手写阿拉伯语文档中发现单词

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The design and performance of a system for spotting handwritten Arabic words in scanned document images is presented. Three main components of the system are a word segmenter, a shape based matcher for words and a search interface. The user types in a query in English within a search window, the system finds the equivalent Arabic word, e.g., by dictionary look-up, locates word images in an indexed (segmented) set of documents. A two-step approach is employed in performing the search: (1) prototype selection: the query is used to obtain a set of handwritten samples of that word from a known set of writers (these are the prototypes), and (2) word matching: the prototypes are used to spot each occurrence of those words in the indexed document database. A ranking is performed on the entire set of test word images- where the ranking criterion is a similarity score between each prototype word and the candidate words based on global word shape features. A database of 20,000 word images contained in 100 scanned handwritten Arabic documents written by 10 different writers was used to study retrieval performance. Using five writers for providing prototypes and the other five for testing, using manually segmented documents, 55% precision is obtained at 50% recall. Performance increases as more writers are used for training.

机译：介绍了在扫描的文档图像中识别手写阿拉伯语单词的系统的设计和性能。该系统的三个主要组件是分词器，基于形状的单词匹配器和搜索界面。用户在搜索窗口中以英语键入查询，系统例如通过字典查找找到等效的阿拉伯语单词，在索引的（分段的）文档集中找到单词图像。执行搜索时采用两步方法：（1）原型选择：查询用于从一组已知的作者（这些原型）中获得该单词的一组手写样本，以及（2）单词匹配：原型用于发现索引文档数据库中这些单词的每次出现。对测试单词图像的整个集合进行排名-排名标准是基于整体单词形状特征的每个原型单词和候选单词之间的相似性评分。由10个不同的作者撰写的100个扫描的阿拉伯手写文档中包含20,000个单词图像的数据库，用于研究检索性能。使用五位编写者提供原型，其他五位编写者进行测试，使用手动分段的文档，则在50％的召回率下可获得55％的精度。随着越来越多的作家用于培训，性能得到提高。

著录项

来源
《Document Recognition and Retrieval XIII; Electronic Imaging Science and Technology》|2006年|P.606702.1-606702.12|共12页
会议地点 San JoseCA(US)
作者
Sargur Srihari; Harish Srinivasan; Pavithra Babu; Chetan Bhole;
展开▼
作者单位

Center of Excellence for Document Analysis Recognition (CEDAR) University at Buffalo, State University of New York Amherst, New York 14228;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类模式识别与装置;
关键词

相似文献

外文文献
中文文献
专利

1. Word spotting in handwritten Arabic documents using bag-of-descriptors [J] . Youssef Elfakir, Ghizlane Khaissidi, Mostafa Mrabti, Contemporary Engineering Sciences . 2016,第25a28期

机译：使用描述符袋在手写阿拉伯文文档中发现单词
2. Learning-based word spotting system for Arabic handwritten documents [J] . Muna Khayyat, Louisa Lam, Ching Y. Suen Pattern Recognition: The Journal of the Pattern Recognition Society . 2014,第3期

机译：基于学习的阿拉伯手写文档单词发现系统
3. FEATURE EXTRACTION IN SEGMENTED WORDS FOR SEMI-AUTOMATIC TRANSCRIPTION OF HANDWRITTEN ARABIC DOCUMENTS [J] . NOUREDDINE EL MAKHFI, RACHID BENSLIMANE Journal of Theoretical and Applied Information Technology . 2014,第1期

机译：手写阿拉伯文半自动翻译中分词的特征提取
4. Spotting Words in Handwritten Arabic Documents [C] . Sargur Srihari, Harish Srinivasan, Pavithra Babu, Conference on Document Recognition and Retrieval . 2006

机译：在手写的阿拉伯文文件中发现单词
5. Writer identification of Arabic handwritten documents. [D] . Awaida, Sameh Mohammad. 2011

机译：阿拉伯手写文件的作家身份证明。
6. Hough Transform-Based Angular Features for Learning-Free Handwritten Keyword Spotting [O] . Subhranil Kundu, Samir Malakar, Zong Woo Geem, 2021

机译：基于Hough的转换的角度特征用于无学习手写关键字斑点
7. Spotting words in handwritten Arabic documents [O] . Sargur Srihari, Harish Srinivasan, Pavithra Babu, 2006

机译：在手写阿拉伯文文档中发现单词

Spotting Words in Handwritten Arabic Documents

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅