首页> 美国政府科技报告 >Statistical Approach to Retrieving Historical Manuscript Images without Recognition

【24h】

Statistical Approach to Retrieving Historical Manuscript Images without Recognition

机译：无识别检索历史稿件图像的统计方法

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Handwritten historical document collections in libraries and other areas are often of interest to researchers, students, or the general public. Convenient access to such corpora generally requires an index, which allows one to locate individual text units (pages, sentences, lines) that are relevant to a given query (usually provided as text). Several solutions are possible: manual annotation (very expensive), handwriting recognition (poor results), and word spotting -- an image matching approach (computationally expensive). In this work, the authors present a novel retrieval approach for historical document collections that does not require recognition. They assume that word images can be described using a vocabulary of discretized word features. From a training set of labeled word images, they extract discrete feature vectors, and estimate the joint probability distribution of features and word labels. For a given feature vector (i.e., a word image), they can then calculate conditional probabilities for all labels in the training vocabulary. Experiments show that this relevance-based language model works very well with a mean average precision of 89% for 4-word queries on a subset of George Washington's manuscripts.

著录项

作者
Rath, T. M. ; Lavrenko, V. ; Manmatha, R.;
展开▼
作者单位

展开▼
年度 2003
页码 p.1-10
总页数 10
原文格式 PDF
正文语种 eng
中图分类工业技术;
关键词
Images; Documents; Information retrieval; Matching; Handwriting; Words(Language); History; Automation; Feature extraction; Labels; Archives; Probability distribution functions; Precision; Vocabulary; Shape;

机译：图像;文档;信息检索;匹配;手写;单词（语言）;历史;自动化;特征提取;标签;档案;概率分布函数;精度;词汇;形状;

相似文献

外文文献
中文文献
专利

1. Erratum: A spatially adaptive statistical method for the binarization of historical manuscripts and degraded document images (Pattern Recognition (2011) 44 (2184-2196) DOI: 10.1016/j.patcog.2011.02.021) [J] . Hedjam R., Farrahi Moghaddam R., Cheriet M. Pattern Recognition: The Journal of the Pattern Recognition Society . 2012,第8期

机译：勘误：一种用于历史手稿和退化文档图像二值化的空间自适应统计方法（模式识别（2011）44（2184-2196）DOI：10.1016 / j.patcog.2011.02.021）
2. A spatially adaptive statistical method for the binarization of historical manuscripts and degraded document images [J] . Hedjam R., Moghaddam R.F., Cheriet M. Pattern Recognition: The Journal of the Pattern Recognition Society . 2011,第9期

机译：用于历史手稿和退化文档图像二值化的空间自适应统计方法
3. Facial Image Recognition Based on a Statistical Uncorrelated Near Class Discriminant Approach [J] . Sheng LI, Xiao-Yuan JING, Lu-Sha BIAN, IEICE transactions on information and systems . 2010,第4期

机译：基于统计不相关近类别判别方法的人脸图像识别
4. A binarization algorithm for historical arabic manuscript images using a neutrosophic approach [C] . Amin K.M., Abd Elfattah M., Hassanien A.E., International Conference on Computer Engineering Systems . 2014

机译：使用中智方法对历史阿拉伯手稿图像进行二值化处理
5. A statistical approach to binary and multiple-class pattern recognition of motor imagery by non-invasive EEG for brain computer interface (BCI) applications. [D] . Sarkar, Angikar. 2012

机译：一种用于脑计算机接口（BCI）应用的非侵入性EEG对运动图像进行二进制和多类模式识别的统计方法。
6. Retrieving challenging vessel connections in retinal images by line co-occurrence statistics [O] . Samaneh Abbasi-Sureshjani, Jiong Zhang, Remco Duits, -1

机译：通过线并发统计检索视网膜图像中具有挑战性的血管连接
7. A Binarization Algorithm for Historical Arabic Manuscript Images using a Neutrosophic Approach [O] . Khalid M. Amin, Mohamed Abd Elfattah, Aboul Ella Hassanien, 2014

机译：中智方法对历史阿拉伯手稿图像的二值化算法

Statistical Approach to Retrieving Historical Manuscript Images without Recognition

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅