首页> 外文会议>IEEE International Conference on Data Engineering >CurrentClean: Spatio-Temporal Cleaning of Stale Data
【24h】

CurrentClean: Spatio-Temporal Cleaning of Stale Data

机译:紧张的手段:陈旧数据的时空清洗

获取原文

摘要

Data currency is imperative towards achieving up-to-date and accurate data analysis. Data is considered current if changes in real world entities are reflected in the database. When this does not occur, stale data arises. Identifying and repairing stale data goes beyond simply having timestamps. Individual entities each have their own update patterns in both space and time. These update patterns can be learned and predicted given available query logs. In this paper, we present CurrentClean, a probabilistic system for identifying and cleaning stale values. We introduce a spatio-temporal probabilistic model that captures the database update patterns to infer stale values, and propose a set of inference rules that model spatio-temporal update patterns commonly seen in real data. We recommend repairs to clean stale values by learning from past update values over cells. Our evaluation shows CurrentClean's effectiveness to identify stale values over real data, and achieves improved error detection and repair accuracy over state-of-the-art techniques.
机译:数据货币迫切需要实现最新和准确的数据分析。如果在数据库中反映了现实世界实体的更改,则数据被视为当前。当不发生这种情况时,出现了陈旧的数据。识别和修复陈旧数据超出简单的时间戳。各个实体每个都有自己的空间和时间更新模式。可以学习这些更新模式并预测可用查询日志。在本文中,我们呈现了识别和清洁陈旧值的概率系统。我们介绍了一种时空概率模型,捕获数据库更新模式以推断出陈旧值,并提出一组推理规则,其在真实数据中常见的时空更新模式模型。我们建议修复通过在Cells上的过去的更新值中学习来修复清洁陈旧值。我们的评估显示了当前杂志的效果,以识别实际数据的陈旧价值,并实现了最先进的技术的误差检测和修复精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号