首页> 外文会议>Data Engineering Workshops (ICDEW), 2010 >Profiling linked open data with ProLOD
【24h】

Profiling linked open data with ProLOD

机译:使用ProLOD分析链接的开放数据

获取原文

摘要

Linked open data (LOD), as provided by a quickly growing number of sources constitutes a wealth of easily accessible information. However, this data is not easy to understand. It is usually provided as a set of (RDF) triples, often enough in the form of enormous files covering many domains. What is more, the data usually has a loose structure when it is derived from end-user generated sources, such as Wikipedia. Finally, the quality of the actual data is also worrisome, because it may be incomplete, poorly formatted, inconsistent, etc. To understand and profile such linked open data, traditional data profiling methods do not suffice. With ProLOD, we propose a suite of methods ranging from the domain level (clustering, labeling), via the schema level (matching, disambiguation), to the data level (data type detection, pattern detection, value distribution). Packaged into an interactive, web-based tool, they allow iterative exploration and discovery of new LOD sources. Thus, users can quickly gauge the relevance of the source for the problem at hand (e.g., some integration task), focus on and explore the relevant subset.
机译:迅速增加的来源提供的链接开放数据(LOD)构成了大量易于访问的信息。但是,此数据不容易理解。它通常以一组(RDF)三元组的形式提供,通常以覆盖多个域的巨大文件的形式提供。此外,从最终用户生成的数据源(例如Wikipedia)派生数据时,数据通常具有松散的结构。最后,实际数据的质量也令人担忧,因为它可能不完整,格式不正确,不一致等。要理解和剖析此类链接的开放数据,传统的数据分析方法是不够的。借助ProLOD,我们提出了一系列方法,范围从域级别(聚类,标记)到模式级别(匹配,消歧),再到数据级别(数据类型检测,模式检测,值分配)。通过打包到基于Web的交互式工具中,它们可以迭代探索和发现新的LOD源。因此,用户可以快速确定问题源的相关性(例如,某些集成任务),集中精力并探索相关的子集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号