【24h】

A DSL for Automated Data Quality Monitoring

机译:用于自动数据质量监控的DSL

获取原文

摘要

Data is getting more and more ubiquitous while its importance rises. The quality and outcome of business decisions is directly related to the accuracy of data used in predictions. Thus, a high data quality in database systems being used for business decisions is of high importance. Otherwise bad consequences in the form of commercial loss or even legal implications loom. In this paper we focus on automating advanced data quality monitoring, and especially the aspect of expressing and evaluating rules for good data quality. We present a domain specific language (DSL) called RADAR for data quality rules, that fulfills our main requirements: reusability of check logic, separation of concerns for different user groups, support for heterogeneous data sources as well as advanced data quality rules such as time series rules. Also, it provides the option to automatically suggest potential rules based on historic data analysis. Furthermore, we show initial optimization approaches for the execution of rules on large data sets and evaluate our language based on these optimizations. All in all the language presents a novel approach for a flexible and powerful management of data quality in practical applications while meeting the needs of actual data quality managers in being pragmatic and efficient.
机译:数据在其重要性上升时越来越多。业务决策的质量和结果与预测中使用的数据的准确性直接相关。因此,用于业务决策的数据库系统中的高数据质量具有很高的重要性。否则以商业损失或甚至法律影响织机的形式不良后果。在本文中,我们专注于自动化高级数据质量监测,尤其是表达和评估良好数据质量规则的方面。我们提出了一种称为雷达的域特定语言(DSL),用于数据质量规则,满足我们的主要要求:检查逻辑的可重用性,对不同用户组的关注分离,支持异构数据源以及高级数据质量规则,如时间系列规则。此外,它提供了基于历史数据分析自动建议潜在规则的选项。此外,我们显示了在大数据集上执行规则的初始优化方法,并根据这些优化评估我们的语言。所有语言中的所有语言都是一种新的方法,可以在实际应用中灵活,强大的数据质量管理,同时满足实际数据质量管理人员的需求,务实和高效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号