首页> 外文会议>IEEE International Conference on Semantic Computing >Ontology-Based Semantic Search for Open Government Data
【24h】

Ontology-Based Semantic Search for Open Government Data

机译:基于本体的开放政府数据语义搜索

获取原文

摘要

Open data are increasingly available in amount, but often with unprecise or incomplete description. It is time consuming and difficult to discover relevant datasets. Current open data catalogues provide mostly keyword-based search without the ability to understand the user's intent and the contextual meaning of the datasets. Ontology-based semantic search has been well explored in semantic web as an attempt to improve the quality of search for relevant documents and web pages. This paper applies semantic and machine learning technologies to open data. It presents an approach for search of open government datasets, a relatively underexplored domain, where the semantics of data relies on metadata that describes the data. The idea is to link the published datasets with concepts from a well-defined ontology and allow searching based on hybrid indexing. A simplified ontology for the transport domain is constructed to demonstrate and test the idea. A prototype search engine has been implemented which supports both manual and automatic linking to concepts in the ontology and exploits hybrid indexing based on these linking methods. Natural language processing (NLP) techniques are applied to dataset linking and indexing and enable the independency of the natural language used for describing the datasets. The manual linking of datasets to ontology concepts is intended for domain experts and data publishers, while the automatic linking is based on the provided dataset descriptions. The automatic linking reduces the overhead of manual concepts linking and the dependency on domain experts. Preliminary results have indicated that semantic search based on ontologies is a promising approach to increase search quality and efficiency for open data search. The success of the automatic mechanism does however depend on the quality and comprehensiveness of the dataset descriptions.
机译:开放数据的数量越来越多,但描述往往不准确或不完整。这是耗时的并且难以发现相关的数据集。当前的开放数据目录主要提供基于关键字的搜索,而无法理解用户的意图和数据集的上下文含义。在语义网中,基于本体的语义搜索已经得到了很好的探索,以提高相关文档和网页的搜索质量。本文将语义和机器学习技术应用于开放数据。它提出了一种搜索开放政府数据集的方法,这是一个相对未开发的领域,其中数据的语义依赖于描述数据的元数据。这个想法是将已发布的数据集与定义良好的本体中的概念链接起来,并允许基于混合索引进行搜索。构建了用于传输域的简化本体,以演示和测试该思想。已经实现了原型搜索引擎,该引擎支持手动和自动链接到本体中的概念,并基于这些链接方法利用混合索引。自然语言处理(NLP)技术已应用于数据集链接和索引编制,并实现了用于描述数据集的自然语言的独立性。手动将数据集链接到本体概念供领域专家和数据发布者使用,而自动链接则基于提供的数据集描述。自动链接减少了手动概念链接的开销以及对领域专家的依赖。初步结果表明,基于本体的语义搜索是一种提高开放数据搜索质量和效率的有前途的方法。但是,自动机制的成功确实取决于数据集描述的质量和全面性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号