首页> 外文期刊>Computers & operations research >A branch-and-cut algorithm for the continuous error localization problem in data cleaning
【24h】

A branch-and-cut algorithm for the continuous error localization problem in data cleaning

机译:一种用于数据清理中连续错误定位问题的分支切割算法

获取原文
获取原文并翻译 | 示例
           

摘要

Data collected by statistical agencies may contain mistakes made during the acquisition, transcription and coding process. Thus, before using all this information to infer statistical properties, the statistical agencies must check the consistency of their information. To this end, each record has to be tested on a set of consistency rules. Therefore, if one of these records does not meet all the consistency rules, it must be established which fields have to be modified in order to make the new record valid with respect to that set of consistency rules. Among all the possible solutions, statistical agencies are interested in finding one in which the number of fields that should be modified is minimum, thus leading to a combinatorial optimization problem known as the Error Localization Problem. This article approaches the optimization problem of finding the smallest set of fields whose values must be changed in order to satisfy a given set of consistency rules. With this purpose in mind an Integer Linear Programming model is proposed for the particular case in which the fields are continuous values and the consistency rules are given by linear inequalities. This model is solved through a branch-and-cut approach based on a Benders' decomposition. The new proposal is compared to others previously published in the literature and tested on benchmark instances. The overall performance of these new algorithms succeeded in solving to optimality difficult instances with up to 100 fields in a record in about 1 min of a personal computer.
机译:统计机构收集的数据可能包含在获取,转录和编码过程中犯的错误。因此,在使用所有这些信息来推断统计属性之前,统计机构必须检查其信息的一致性。为此,必须在一组一致性规则上测试每条记录。因此,如果这些记录之一不满足所有一致性规则,则必须确定必须修改哪些字段,以使新记录相对于该一致性规则集有效。在所有可能的解决方案中,统计机构都希望找到一种应该修改的字段数量最少的解决方案,从而导致称为错误定位问题的组合优化问题。本文探讨了一种优化问题,即找到最小的字段集,这些字段的值必须更改才能满足给定的一致性规则集。考虑到这一目的,针对字段是连续值且一致性规则由线性不等式给出的特殊情况,提出了整数线性规划模型。该模型通过基于Benders分解的分支切割方法进行求解。将该新提案与先前在文献中发表的其他提案进行了比较,并在基准实例上进行了测试。这些新算法的整体性能成功地解决了最困难的情况,在大约一分钟的个人计算机中,记录中有多达100个字段。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号