首页> 外文会议>IEEE International Congress on Big Data >BayesWipe: A multimodal system for data cleaning and consistent query answering on structured bigdata
【24h】

BayesWipe: A multimodal system for data cleaning and consistent query answering on structured bigdata

机译:BayesWipe:一种多模式系统,用于数据清理和对结构化大数据的一致查询应答

获取原文

摘要

Recent efforts in data cleaning of structured data have focused exclusively on problems like data deduplication, record matching, and data standardization; none of these focus on fixing incorrect attribute values in tuples. Correcting values in tuples is typically performed by a minimum cost repair of tuples that violate static constraints like CFDs (which have to be provided by domain experts, or learned from a clean sample of the database). In this paper, we provide a method for correcting individual attribute values in a structured database using a Bayesian generative model and a statistical error model learned from the noisy database directly. We thus avoid the necessity for a domain expert or clean master data. We also show how to efficiently perform consistent query answering using this model over a dirty database, in case write permissions to the database are unavailable. We evaluate our methods over both synthetic and real data.
机译:最近的结构化数据清理的努力专注于数据重复数据删除,记录匹配和数据标准化等问题;这些都没有关注在元组中修复不正确的属性值。校正元组中的值通常由元组的最小成本修复来执行,这些元元组是违反CFD(必须由域专家提供的或从数据库的清洁样本中学到)。在本文中,我们提供了一种使用贝叶斯生成模型和直接从嘈杂数据库中学到的统计误差模型来纠正结构化数据库中的单个属性值的方法。因此,我们避免了域专家或清洁主数据的必要性。我们还展示了如何在脏数据库中使用此模型有效地执行一致的查询应答,以防数据库的写入权限不可用。我们在合成和实际数据上评估我们的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号