Towards Searchable Digital Urdu Libraries - A Word Spotting Based Retrieval Approach

机译：对可搜索的数字Urdu库 - 一种基于词的检索方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Libraries in South Asia hold huge collections of valuable printed documents in Urdu and it is of interest to digitize these collections to make them more accessible. The unavailability of an OCR for Urdu however limits the concept of a digital Urdu library to scanning of documents only, offering very limited search facility based on manually assigned tags. We address this issue by proposing a word spotting based keyword search method for information retrieval in digitized collections of printed Urdu documents. The proposed method is based on segmentation of Urdu text in to partial words and representing each partial word by a set of features. To search a specific word (or phrase), the user provides a query in the form of an image. Comparing the features of the partial words in the query image with the ones already indexed, the user is provided with a list of documents containing occurrences of the queried word. The system evaluated on 50 Urdu documents exhibited a recall of 95.17% and a precision of 94.3%.

机译：南亚图书馆在Urdu举办了巨大的有价值的印刷文件的巨大系列，对这些收藏品进行了兴趣使它们更容易获得。然而，URDU的OCR的不可用来限制了数字URDU库的概念，仅限于扫描文档，基于手动分配的标签提供非常有限的搜索功能。我们通过提出基于Word Spotting的关键字搜索方法来解决此问题，用于在打印的URDU文档的数字化集合中检索的信息检索。所提出的方法基于URDU文本的分割，以部分单词，并通过一组特征表示每个部分单词。为了搜索特定的单词（或短语），用户以图像的形式提供查询。使用已经索引的查询图像中的查询图像中的部分单词的特征进行比较，用户被提供有包含查询字的发生的文档列表。在50乌尔都语文件中评估的系统表现出95.17％的召回，精度为94.3％。

著录项

来源
《International Conference on Document Analysis and Recognition》|2011年||共5页
会议地点
作者
Abidi Ali; Siddiqi Imran; Khurshid Khurram;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP391.41-53;
关键词
Dynamic Time Warping; Urdu digital libraries; Word Spotting;

机译：动态时间翘曲;乌尔都语数字图书馆;单词斑点;

相似文献

外文文献
中文文献
专利

1. Spotting words in silent speech videos: a retrieval-based approach [J] . Jha Abhishek, Namboodiri Vinay P., Jawahar C. V. Machine Vision and Applications . 2019,第2期

机译：在无声语音视频中发现单词：基于检索的方法
2. Content Based Text Information Search and Retrieval in Document Images for Digital Library [J] . A. Sakila, S. Vijayarani Journal of digital information management . 2018,第3期

机译：数字图书馆文档图像中基于内容的文本信息搜索与检索
3. Ontology-based search and document retrieval in a digital library with folk songs [J] . Nisheva-Pavlova M., Pavlov P. Information Services & Use . 2011,第3a4期

机译：在带有民歌的数字图书馆中基于本体的搜索和文档检索
4. Towards Searchable Digital Urdu Libraries - A Word Spotting Based Retrieval Approach [C] . Abidi Ali, Siddiqi Imran, Khurshid Khurram 2011 International Conference on Document Analysis and Recognition . 2011

机译：迈向可搜索的数字乌尔都语图书馆-基于词点识别的检索方法
5. Event-based retrieval from digital libraries containing data streams. [D] . Kholief, Mohamed Hamed. 2003

机译：从包含数据流的数字图书馆中进行基于事件的检索。
6. Towards semantic search and inference in electronic medical records: An approach using concept-based information retrieval [O] . Bevan Koopman, Peter Bruza, Laurianne Sitbon, 2012

机译：面向电子病历中的语义搜索和推理：一种基于概念的信息检索方法
7. Ontology-Based Search and Document Retrieval in a Digital Library with Folk Songs [O] . M. Nisheva-Pavlova, P. Pavlov 2012

机译：基于本体的民歌数字图书馆检索与文献检索

Towards Searchable Digital Urdu Libraries - A Word Spotting Based Retrieval Approach

摘要

著录项

相似文献

相关主题

期刊订阅