Creating Digital Resources from Legacy Documents: An Experience Report from the Biosystematics Domain

机译：从旧版文档创建数字资源：生物系统学领域的经验报告

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Digitized legacy document marked up with XML can be used in many ways, e.g., to generate RDF statements about the world described. A prerequisite for doing so is that the document markup is of sufficient quality. Since fully automated markup-generation methods cannot ensure this, manual corrections and cleaning are indispensable. In this paper, we report on our experiences from a digitization and markup project for a large corpus of legacy documents from the biosystematics domain, with a focus on the use of modern tools. The markup created covers both document structure and semantic details. In contrast to previous markup projects reported on in literature, our corpus consists of large publications that comprise many different semantic units, and the documents contain OCR noise and layout artifacts. A core insight is that digitization and automated markup on the one hand and manual cleaning and correction on the other hand should be tightly interleaved, and that tools supporting this integration yield a significant improvement.

机译：用XML标记的数字化旧版文档可以通过多种方式使用，例如，生成有关所描述世界的RDF语句。这样做的先决条件是文档标记必须具有足够的质量。由于全自动标记生成方法无法确保这一点，因此手动校正和清洁是必不可少的。在本文中，我们报告了来自数字化和标记项目的经验，这些项目来自生物系统学领域的大量旧文档，重点是现代工具的使用。创建的标记涵盖了文档结构和语义细节。与文献中报道的以前的标记项目相比，我们的语料库由大型出版物组成，这些出版物包含许多不同的语义单元，并且文档中包含OCR噪声和布局伪像。核心见解是，一方面数字化和自动标记，另一方面人工清洁和校正应该紧密地交织在一起，支持这种集成的工具可以带来显着的进步。

著录项

来源
《The semantic web : Research and applications》|2009年|P.738-752|共15页
会议地点 Heraklion(GR);Heraklion(GR)
作者
Guido Sautter; rnKlemens Boehm; rnDonat Agosti; rnChristiana Klingenberg;
展开▼
作者单位

Universitaet Karlsruhe (TH), Am Fasanengarten 5, 76128 Karlsruhe;

rnUniversitaet Karlsruhe (TH), Am Fasanengarten 5, 76128 Karlsruhe;

rnAm. Mus. of Nat. Hist., Central Park West at 79th, New York, NY 10024-5192;

rnStaatliches Museum fuer Naturkunde, Erbprinzenstr. 13, 76133 Karlsruhe;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算机网络;
关键词
semantic XML markup; digital resources; RDF generation;

机译：语义XML标记；数字资源； RDF生成;

相似文献

外文文献
中文文献
专利

1. Comparison of patient-reported outcomes measurement information system and legacy instruments in multiple domains among older veterans with chronic back pain [J] . Rabih Nayfe, Matthieu Chansard, Linda S. Hynan, BMC Musculoskeletal Disorders . 2020,第1期

机译：患者报告的结果测量信息系统和传统仪器在多个域中的多个域中的传统仪器，慢性背部疼痛
2. Patients' Experience of Myositis and Further Validation of a Myositis-specific Patient Reported Outcome Measure - Establishing Core Domains and Expanding Patient Input on Clinical Assessment in Myositis. Report from OMERACT 12 [J] . Regardt Malin, Basharat Pari, Christopher-Stine Lisa, The Journal of rheumatology . 2015,第12期

机译：患者的肌炎经验以及针对肌炎患者的成果报告的进一步验证措施-建立核心领域并扩大对肌炎临床评估的患者投入。 OMERACT 12的报告
3. Digital twin technology - external data resources in creating the model and classification of different digital twin types in manufacturing [J] . Csaba Ruzsa Procedia Manufacturing . 2021,第a期

机译：数字双技术 - 外部数据资源在制造中创建不同数字双胞型类型的模型和分类
4. Creating Digital Resources from Legacy Documents: An Experience Report from the Biosystematics Domain [C] . Guido Sautter, Klemens Bohm, Donat Agosti, European Semantic Web Conferenc . 2009

机译：从遗留文档创建数字资源：生物系统的域中的经验报告
5. Preserving long-term access to United States government documents in legacy digital formats. [D] . Woods, Kam A. 2010

机译：保留长期访问旧式数字格式的美国政府文档的权限。
6. Development of a Lived Experience-Based Digital Resource for a Digitally-Assisted Peer Support Program for Young People Experiencing Psychosis [O] . Claire E. Peck, Michelle H. Lim, Melanie Purkiss, 2020

机译：开发基于体验的数字资源为经过精神病的年轻人进行数字辅助的同行支持计划
7. Creating Digital Resources from Legacy Documents: An Experience Report from the Biosystematics Domain [O] . Sautter Guido, Böhm Klemens, Agosti Donat, 2009

机译：从旧版文档创建数字资源：生物系统学领域的经验报告

Creating Digital Resources from Legacy Documents: An Experience Report from the Biosystematics Domain

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅