首页> 外文OA文献 >Tabular Data Cleaning and Linked Data Generation with Grafterizer
【2h】

Tabular Data Cleaning and Linked Data Generation with Grafterizer

机译:表格数据清洁和带有Grafterizer的链接数据生成

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The volume of data being published on the Web and made available as Open Data has significantly increased over the last several years. However, data published by independent publishers are sliced and fragmented. Creating descriptive connections across datasets may considerably enrich data and extend their value. One way to standardize, describe and interconnect the information from heterogeneous data sources is to use Linked Data as a publishing technology. The majority of published open datasets is in a tabular format and the process of generating valid Linked Data from them requires powerful and flexible methods for data cleaning, preparation, and transformation. Most of the time and effort of data workers and data developers is concentrated on data cleaning aspects. In spite of the number of available platforms for tabular data cleaning and preparation, no solution is focused on the Linked Data generation. This thesis explores approaches for data cleaning and transformation in the context of the Linked Data generation and identifies their challenges. This includes reviewing typical tabular data quality issues found in the literature and practical use cases and their categorization in order to produce the requirements on designing a solution in the form of the set of data cleaning and transformation operations. Furthermore, the thesis introduces the Grafterizer software framework, developed to assist data workers and data developers in preparing and converting raw tabular data to Linked Data with simplifying and partially automating this process. The Grafterizer framework is evaluated against existing relevant tools and systems for data cleaning. The contribution of the thesis also includes extending and evaluating reference software system to implement the needed data cleaning and transformation operations. This resulted in a powerful framework for addressing typical data quality issues and a wide range of supported data cleaning and transformation operations.
机译:在Web上发布的数据数量并作为开放数据提供的可用性在过去几年中显着增加。但是,由独立发布商发布的数据被切片和碎片化。在数据集中创建描述性连接可以大大丰富数据并扩展其值。标准化,描述和互连来自异构数据源的信息的一种方法是将链接数据用作发布技术。大多数已发布的Open DataSet是以表格格式的格式,并且从它们生成有效链接数据的过程需要强大而灵活的数据清洁,准备和转换方法。数据工作人员和数据开发人员的大部分时间和精力集中在数据清理方面。尽管表格数据清洁和准备的可用平台的数量,但没有解决联系数据生成的解决方案。本文探讨了链接数据生成的背景下的数据清理和转换方法,并识别其挑战。这包括审查文献和实际用例中发现的典型表格数据质量问题及其分类,以便在数据清洁和转换操作集中设计解决方案的要求。此外,本文介绍了Grafterizer软件框架,该框架开发,以帮助数据工作人员和数据开发人员在准备和转换原始表格数据到链接数据时,以简化和部分自动化此过程。针对现有相关工具和系统进行评估Grafterizer框架以进行数据清洁。本文的贡献还包括扩展和评估参考软件系统以实现所需的数据清洁和转换操作。这导致了解决典型数据质量问题和广泛支持的数据清洁和转换操作的强大框架。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号