A search engine for historical manuscript images

机译：历史手稿图像的搜索引擎

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many museum and library archives are digitizing their large collections of handwritten historical manuscripts to enable public access to them. These collections are only available in image formats and require expensive manual annotation work for access to them. Current handwriting recognizers have word error rates in excess of 50% and therefore cannot be used for such material. We describe two statistical models for retrieval in large collections of handwritten manuscripts given a text query. Both use a set of transcribed page images to learn a joint probability distribution between features computed from word images and their transcriptions. The models can then be used to retrieve unlabeled images of handwritten documents given a text query. We show experiments with a training set of 100 transcribed pages and a test set of 987 handwritten page images from the George Washington collection. Experiments show that the precision at 20 documents is about 0.4 to 0.5 depending on the model. To the best of our knowledge, this is the first automatic retrieval system for historical manuscripts using text queries, without manual transcription of the original corpus.

机译：许多博物馆和图书馆档案馆都在数字化其大量的手写历史手稿收藏，以使公众能够访问它们。这些收藏仅以图像格式提供，并且需要昂贵的手动注释工作才能访问它们。当前的手写识别器的单词错误率超过50％，因此不能用于此类材料。我们描述了两种统计模型，用于在给定文本查询的大量手写手稿中进行检索。两者都使用一组转录的页面图像来学习根据单词图像计算的特征及其转录之间的联合概率分布。然后，在进行文本查询的情况下，可以使用这些模型来检索手写文档的未标记图像。我们展示了一个实验集，其中包含100个转录页的训练集和乔治华盛顿馆藏的987个手写页图像的测试集。实验表明，取决于模型，在20个文档上的精度约为0.4到0.5。据我们所知，这是第一个使用文本查询自动检索历史手稿的系统，而无需人工转录原始语料库。

著录项

来源
《Annual international ACM SIGIR conference on Research and development in information retrieval;International ACM SIGIR conference on Research and development in information retrieval》|2004年|P.369-376|共8页
会议地点
作者
Toni M. Rath; R. Manmatha; Victor Lavrenko;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类各种专用数据库;
关键词
relevance models;

机译：相关模型;

相似文献

外文文献
中文文献
专利

1. Design considerations for a large-scale image-based text search engine in historical manuscript collections [J] . Lambert Schomaker Information Technology . 2016,第2期

机译：历史手稿集中大型基于图像的文本搜索引擎的设计注意事项
2. Learning-free pattern detection for manuscript research: An efficient approach toward making manuscript images searchable [J] . Mohammed Hussein, Maergner Volker, Ciotti Giovanni International Journal on Document Analysis and Recognition . 2021,第3期

机译：稿件研究的无学习模式检测：在搜索稿件图像的有效方法
3. Content Based Search Engine for Historical Calligraphy Images [J] . Xiafen Zhang, Viiayan Sugumaran International Journal of Intelligent Information Technologies . 2014,第3期

机译：基于内容的历史书法图像搜索引擎
4. A search engine for historical manuscript images [C] . Toni M. Rath, R. Manmatha, Victor Lavrenko Annual international ACM SIGIR conference on Research and development in information retrieval . 2004

机译：用于历史稿件图像的搜索引擎
5. Generating Latinas: Online Images and the Mechanisms of the Google Search Engine. [D] . Cortez, Elizabeth B. 2012

机译：生成拉丁文：在线图片和Google搜索引擎的机制。
6. Historical Typhoon Search Engine Based on Track Similarity [O] . Meng-Han Tsai, Hao-Yung Chan, Chun-Mo Hsieh, 2019

机译：基于航迹相似度的历史台风搜索引擎
7. A Search Engine for Historical Manuscript Images [O] . Rath, Toni, Manmatha, R., Lavrenko, Victor 2004

机译：历史手稿图像的搜索引擎

A search engine for historical manuscript images

摘要

著录项

相似文献

相关主题

期刊订阅