首页> 外文会议>IEEE International Conference on Big Data >An Interactive Data Quality Test Approach for Constraint Discovery and Fault Detection
【24h】

An Interactive Data Quality Test Approach for Constraint Discovery and Fault Detection

机译:一种用于约束发现和故障检测的交互式数据质量测试方法

获取原文

摘要

Data quality tests validate heterogeneous data to detect violations of syntactic and semantic constraints. The specification of these constraints can be incomplete because domain experts typically specify them in an ad hoc manner. Existing automated test approaches can generate false alarms and do not explain the constraint violations while reporting faulty data records. In previous work, we proposed ADQuaTe, which is an automated data quality test approach that uses an unsupervised deep learning techni que (1) to discover constraints from big datasets that may have been missed by experts, and (2) to label as suspicious those records that violate the constraints. These records are grouped and explanations for constraint violations are presented to domain experts who determine whether or not the groups are actually faulty. This paper presents ADQuaTe2, which extends ADQuaTe to use an interactive learning technique that incorporates expert feedback to retrain the learning model and improve the accuracy of constraint discovery and fault detection. We evaluate the effectiveness of the approach on real-world datasets from a health data warehouse and a plant diagnosis database. We also use datasets with known faults from the UCI repository to evaluate the improvement in the accuracy of the approach after incorporating ground truth knowledge.
机译:数据质量测试验证异构数据,以检测违反句法和语义约束的情况。这些约束的说明可能是不完整的,因为领域专家通常以临时方式指定它们。现有的自动测试方法可能会生成错误警报,并且在报告错误的数据记录时不会解释违反约束的情况。在先前的工作中,我们提出了ADQuaTe,这是一种自动化的数据质量测试方法,它使用无监督的深度学习技术(1)从大型数据集中发现可能被专家遗漏的约束,并且(2)将可疑标记为可疑违反约束的记录。将这些记录分组,并将约束违反的说明提供给领域专家,他们将确定组是否实际存在错误。本文介绍了ADQuaTe2,它将ADQuaTe扩展为使用一种交互式学习技术,该技术结合了专家反馈来重新训练学习模型并提高约束发现和故障检测的准确性。我们从健康数据仓库和工厂诊断数据库评估该方法对真实数据集的有效性。我们还使用UCI信息库中已知故障的数据集,在结合了地面真相知识后评估方法准确性的提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号