Making Two Vast Historical Manuscript Collections Searchable and Extracting Meaningful Textual Features Through Large-Scale Probabilistic Indexing

机译：通过大规模概率索引，制作两个庞大的历史稿件收藏品并提取有意义的文本特征

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Textual access to large collections of digitized images remains unfeasible because usually they lack transcripts. Transcribing such collections is in turn typically unattainable in terms of costs. However, the use of probabilistic indices can facilitate textual accessing with only moderate demands of resources. Besides allowing effortless information retrieval, it will be shown that probabilistic indices can also be used to estimate textual features of the indexed but otherwise untranscribed collections, such as running words and Zipf's curves. Complete probabilistic indices have been recently produced for two iconic large collections: "Bentham" (90K images) and "Spanish Golden Age Theater" (40K images). To show the repercussion of making these collections searchable, we provide accessing statistics gathered through their corresponding search interfaces. To the best of our knowledge this is the first publication of large collections of untranscribed manuscripts which are now publicly accessible for effective and efficient textual access.

机译：对大量数字化图像的文本访问仍然是不可行的，因为通常他们缺乏成绩单。在成本方面，转向这些集合通常是无法实现的。然而，使用概率索引可以促进只需适度的资源需求的文本访问。除了允许轻松的信息检索之外，还将显示概率指数还可用于估计索引的文本特征，而否则是未经筛选的集合，例如运行单词和ZIPF的曲线。最近为两个标志性的大型收藏品制作了完整的概率指数：“Bentham”（90K图像）和“西班牙黄金时代剧院”（40K图像）。要显示使这些集合可搜索的影响，我们提供通过相应的搜索界面收集的访问统计信息。据我们所知，这是第一次出版大量未经筛查的手稿，现在可以公开访问有效和有效的文本访问。

著录项

来源
《International Conference on Document Analysis and Recognition》|2019年|1 v.|共6页
会议地点
作者
Alejandro Héctor Toselli; Verónica Romero; Joan Andreu Sánchez; Enrique Vidal;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动化技术及设备;
关键词
Probabilistic logic; Computational modeling; Training; Indexing; Optical imaging; Integrated optics;

机译：概率逻辑;计算建模;培训;索引;光学成像;集成光学;

相似文献

外文文献
中文文献
专利

1. Design considerations for a large-scale image-based text search engine in historical manuscript collections [J] . Lambert Schomaker Information Technology . 2016,第2期

机译：历史手稿集中大型基于图像的文本搜索引擎的设计注意事项
2. Selective Search: Efficient and Effective Search of Large Textual Collections [J] . Kulkarni Anagha, Callan Jamie ACM Transactions on Information Systems . 2015,第4期

机译：选择性搜索：大型文本集的高效搜索
3. TSS: Efficient Term Set Search in Large Peer-to-Peer Textual Collections [J] . Computers, IEEE Transactions on . 2010,第7期

机译：TSS：大型对等文本集合中的有效术语集搜索
4. Making Two Vast Historical Manuscript Collections Searchable and Extracting Meaningful Textual Features Through Large-Scale Probabilistic Indexing [C] . Alejandro Héctor Toselli, Verónica Romero, Joan Andreu Sánchez, International Conference on Document Analysis and Recognition . 2019

机译：通过大规模概率索引使两个庞大的历史手稿集可搜索并提取有意义的文本特征
5. Content-Based Retrieval of Arabic Historical Manuscripts Using Latent Semantic Indexing [D] . Yahia, Mohammad Husni Najib 2011

机译：基于内容的潜在语义索引对阿拉伯历史手稿的基于内容的检索
6. Feasibility of feature-based indexing clustering and search of clinical trials: A case study of breast cancer trials from ClinicalTrials.gov [O] . Mary Regina Boland, Riccardo Miotto, Junfeng Gao, -1

机译：基于特征的索引聚类和临床试验搜索的可行性：来自ClinicalTrials.gov的乳腺癌试验案例研究
7. Greek Gospel Texts in America. Edgar J. Goodspeed. (Historical and Linguistic Studies in Literature Related to the New Testament. First Series; Texts. Vol. II). The University of Chicago Press. 1918. Pp. x, 186. $1.50. - The Gospel Manuscripts of the General Theological Seminary. Charles Carroll Edmunds and William Henry Paine Hatch. (Harvard Theological Studies. IV). Harvard University Press. 1918. Pp. 68. - The Washington Manuscript of the Epistles of Paul. Henry A. Sanders. (The New Testament Manuscripts in the Freer Collection. Part II). The Macmillan Company. 1918. Pp. x, 65. $1.25. [O] . James Hardy Ropes 1919

机译：希腊福音文本在美国。 Edgar J. Goodenebeed。（新遗嘱文学中的历史与语言学研究。第一个系列;文本。卷。II）。芝加哥大学出版社。 1918. PP。 X，186. $ 1.50。 - 普通神学神学的福音手稿。 Charles Carroll Edmunds和William Henry Paine Hatche。（哈佛神学研究。IV）。哈佛大学出版社。 1918. PP。 68. - 保罗书信的华盛顿稿件。亨利A.桑德斯。（Freeer Collection中的新约留手稿。第二部分）。 Macmillan公司。 1918. PP。 X，65.1.25美元。
8. Guide to Historical Resources in the Atmospheric Sciences: Archives, Manuscripts, and Special Collections in the Washington, DC. Area [R] . Fleming, J. R. 1989

机译：大气科学历史资源指南：华盛顿特区的档案，手稿和特藏。区域

Making Two Vast Historical Manuscript Collections Searchable and Extracting Meaningful Textual Features Through Large-Scale Probabilistic Indexing

摘要

著录项

相似文献

相关主题

期刊订阅