首页> 外文会议>SYNAT Workshop >On Digitalizing the Geographical Dictionary of Polish Kingdom Published in 1880
【24h】

On Digitalizing the Geographical Dictionary of Polish Kingdom Published in 1880

机译:论1880年发表的波兰王国的地理大学词典

获取原文

摘要

Printed encyclopedic texts provide us with great quality and organized textual data but make us face several problems. Most of the modern data extractors are indexers that create rather simple databases. Such type of database provides keyword search and simple links between subjects (based on co-occurrences). The most advanced extractors are supported by predefined ontologies which help to build relations between concepts. In this paper we study the problem of digitalizing the Geo graphical Dictionary of Polish Kingdom published in 1880. We address two kinds of problems: technical problems of converting scanned pages of that dictionary to reliable textual data and theoretical challenges of organizing that data into knowledge. Our solution for organizing data into knowledge is based on defining an appropriate ontology and rules for converting textual data to ontological knowledge. We describe meth ods of extracting simple information like keywords and bootstrapping it into higher level relations and discuss their possible uses.
机译:印刷的百科全书文本为我们提供了良好的质量和有组织的文本数据,但让我们面临几个问题。大多数现代数据提取器都是创建相当简单的数据库的索引器。这种类型的数据库提供了科目之间的关键字搜索和简单链接(基于共同发生)。预定义的本体支持最先进的提取器,有助于建立概念之间的关系。在本文中,我们研究了1880年发表的波兰王国地理图形词典的问题。我们解决了两种问题:将该字典的扫描页面转换为可靠的文本数据和将该数据组织到知识的理论挑战的技术问题。我们将数据组织到知识的解决方案是基于定义适当的本体论和将文本数据转换为本体知识的结构。我们描述了提取简单信息,如关键词并将其引导到更高的级别关系中并讨论可能的用途。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号