首页> 外文期刊>Journal of Universal Computer Science >User-Oriented Approach to Data Quality Evaluation
【24h】

User-Oriented Approach to Data Quality Evaluation

机译:以用户为导向的数据质量评估方法

获取原文
       

摘要

The paper proposes a new data object-driven approach to data quality evaluation. It consists of three main components: (1) a data object, (2) data quality requirements, and (3) data quality evaluation process. As data quality is of relative nature, the data object and quality requirements are (a) use-case dependent and (b) defined by the user in accordance with his needs. All three components of the presented data quality model are described using graphical Domain Specific Languages (DSLs). In accordance with Model-Driven Architecture (MDA), the data quality model is built in two steps: (1) creating a platform-independent model (PIM), and (2) converting the created PIM into a platform-specific model (PSM). The PIM comprises informal specifications of data quality. The PSM describes the implementation of a data quality model, thus making it executable, enabling data object scanning and detecting data quality defects and anomalies. The proposed approach was applied to open data sets, analysing their quality. At least 3 advantages were highlighted: (1) a graphical data quality model allows the definition of data quality by non-IT and non-data quality experts as the presented diagrams are easy to read, create and modify, (2) the data quality model allows an analysis of "third-party" data without deeper knowledge on how the data were accrued and processed, (3) the quality of the data can be described at least at two levels of abstraction - informally using natural language or formally by including executable artefacts such as SQL statements.
机译:本文提出了一种新的数据对象驱动的数据质量评估方法。它由三个主要组件组成:(1)数据对象,(2)数据质量要求和(3)数据质量评估过程。由于数据质量具有相对性质,数据对象和质量要求是(a)依赖于依赖的(b)由用户根据他的需求定义。使用图形域特定语言(DSL)描述所呈现的数据质量模型的所有三个组件。根据模型驱动的架构(MDA),数据质量模型分为两个步骤:(1)创建独立于平台的模型(PIM),以及将创建的PIM转换为特定于平台的模型(PSM )。 PIM包括数据质量的非正式规范。 PSM描述了数据质量模型的实现,从而使其可执行,使数据对象扫描和检测数据质量缺陷和异常。建议的方法应用于打开数据集,分析其质量。突出显示至少3个优点:(1)图形数据质量模型允许通过非IT和非数据质量专家定义数据质量,因为呈现的图表易于阅读,创建和修改,(2)数据质量模型允许分析“第三方”数据,而无需更深入地了解数据如何累计和处理,(3)数据的质量可以至少在两个级别的抽象中描述 - 非正式使用自然语言或正式地使用可执行的人工制品,如SQL语句。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号