A branch-and-cut algorithm for the continuous error localization problem in data cleaning

Jorge Riera-Ledesma; Juan-Jose Salazar-Gonzalez

首页> 外文期刊>Computers & operations research >A branch-and-cut algorithm for the continuous error localization problem in data cleaning

【24h】

A branch-and-cut algorithm for the continuous error localization problem in data cleaning

机译：一种用于数据清理中连续错误定位问题的分支切割算法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data collected by statistical agencies may contain mistakes made during the acquisition, transcription and coding process. Thus, before using all this information to infer statistical properties, the statistical agencies must check the consistency of their information. To this end, each record has to be tested on a set of consistency rules. Therefore, if one of these records does not meet all the consistency rules, it must be established which fields have to be modified in order to make the new record valid with respect to that set of consistency rules. Among all the possible solutions, statistical agencies are interested in finding one in which the number of fields that should be modified is minimum, thus leading to a combinatorial optimization problem known as the Error Localization Problem. This article approaches the optimization problem of finding the smallest set of fields whose values must be changed in order to satisfy a given set of consistency rules. With this purpose in mind an Integer Linear Programming model is proposed for the particular case in which the fields are continuous values and the consistency rules are given by linear inequalities. This model is solved through a branch-and-cut approach based on a Benders' decomposition. The new proposal is compared to others previously published in the literature and tested on benchmark instances. The overall performance of these new algorithms succeeded in solving to optimality difficult instances with up to 100 fields in a record in about 1 min of a personal computer.

机译：统计机构收集的数据可能包含在获取，转录和编码过程中犯的错误。因此，在使用所有这些信息来推断统计属性之前，统计机构必须检查其信息的一致性。为此，必须在一组一致性规则上测试每条记录。因此，如果这些记录之一不满足所有一致性规则，则必须确定必须修改哪些字段，以使新记录相对于该一致性规则集有效。在所有可能的解决方案中，统计机构都希望找到一种应该修改的字段数量最少的解决方案，从而导致称为错误定位问题的组合优化问题。本文探讨了一种优化问题，即找到最小的字段集，这些字段的值必须更改才能满足给定的一致性规则集。考虑到这一目的，针对字段是连续值且一致性规则由线性不等式给出的特殊情况，提出了整数线性规划模型。该模型通过基于Benders分解的分支切割方法进行求解。将该新提案与先前在文献中发表的其他提案进行了比较，并在基准实例上进行了测试。这些新算法的整体性能成功地解决了最困难的情况，在大约一分钟的个人计算机中，记录中有多达100个字段。

著录项

来源
《Computers & operations research》 |2007年第9期|2790-2804|共15页
作者
Jorge Riera-Ledesma; Juan-Jose Salazar-Gonzalez;
展开▼
作者单位

DEIOC, Universidad de La Laguna, 38271 La Laguna, Spain;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
editing and imputation; integer linear programming; benders' decomposition;

机译：编辑和归责;整数线性规划;弯头分解;

相似文献

外文文献
中文文献
专利

1. A heuristic approach for the continuous error localization problem in data cleaning [J] . Jorge Riera-Ledesma, Juan-Jose Salazar-Gonzalez Computers & operations research . 2007,第8期

机译：数据清理中连续错误定位问题的启发式方法
2. A branch-and-cut algorithm for two-stage stochastic mixed-binary programs with continuous first-stage variables [J] . Lewis Ntaimo, Suvrajeet Sen International Journal of Computational Science and Engineering . 2007,第3期

机译：具有连续第一阶段变量的两阶段随机混合二元程序的分支剪切算法
3. Collaborative Localization Algorithms for Wireless Sensor Networks with Reduced Localization Error [J] . I-Shyan Hwang, Prasan Kumar Sahoo Sensors . 2011,第10期

机译：减少定位误差的无线传感器网络协同定位算法
4. Messing Up with Bart: Error Generation for Evaluating Data-Cleaning Algorithms [C] . Patricia C. Arocena, Renee J. Miller, Boris Glavic, International conference on very large data bases . 2016

机译：迷上Bart：错误生成，用于评估数据清理算法
5. A novel characterization of signal space distortion and its application to the quantification of localization algorithm error. [D] . Francisco, John-Austen. 2015

机译：信号空间失真的新颖表征及其在定位算法误差量化中的应用。
6. Sensors Algorithms for Continuous Glucose Monitoring: Glucose Prediction Algorithms from Continuous Monitoring Data: Assessment of Accuracy via Continuous Glucose Error-Grid Analysis [O] . Francesca Zanderigo, Giovanni Sparacino, Boris Kovatchev, 2007

机译：连续葡萄糖监测的传感器和算法：连续监测数据中的葡萄糖预测算法：通过连续葡萄糖误差网格分析的准确性评估
7. Glucose Prediction Algorithms from Continuous Monitoring Data: Assessment of Accuracy via Continuous Glucose Error-Grid Analysis [O] . Zanderigo, Francesca, Sparacino, Giovanni, Kovatchev, Boris, 2007

机译：连续监测数据中的葡萄糖预测算法：通过连续葡萄糖误差网格分析评估准确性
8. Algorithms for Error Localization of Discrete Data [R] . Liepins, G. E. 1984

机译：离散数据误差定位算法

A branch-and-cut algorithm for the continuous error localization problem in data cleaning

摘要

著录项

相似文献

相关主题

期刊订阅