首页> 外文会议>AAAI Conference on Artificial Intelligence >Accelerating Column Generation via Flexible Dual Optimal Inequalities with Application to Entity Resolution
【24h】

Accelerating Column Generation via Flexible Dual Optimal Inequalities with Application to Entity Resolution

机译:通过应用于实体分辨率的灵活双重最佳不等式加速列生成

获取原文

摘要

In this paper, we introduce a new optimization approach to Entity Resolution. Traditional approaches tackle entity resolution with hierarchical clustering, which does not benefit from a formal optimization formulation. In contrast, we model entity resolution as correlation-clustering, which we treat as a weighted set-packing problem and write as an integer linear program (ILP). In this case, sources in the input data correspond to elements and entities in output data correspond to sets/clusters. We tackle optimization of weighted set packing by relaxing integrality in our ILP formulation. The set of potential sets/clusters can not be explicitly enumerated, thus motivating optimization via column generation. In addition to the novel formulation, we also introduce new dual optimal inequalities (DOI), that we call flexible dual optimal inequalities, which tightly lower-bound dual variables during optimization and accelerate column generation. We apply our formulation to entity resolution (also called de-duplication of records), and achieve state-of-the-art accuracy on two popular benchmark datasets. Our F-DOI can be extended to other weighted set-packing problems.
机译:在本文中,我们向实体解析引入了一种新的优化方法。传统方法使用分层聚类解决实体分辨率,不会受益于正式优化配方。相比之下,我们将实体分辨率模拟为相关聚类,我们将作为加权设定包装问题并将其作为整数线性程序(ILP)进行写入。在这种情况下,输入数据中的源对应于输出数据中的元素和实体对应于设置/群集。我们在我们的ILP配方中放松的完整性来解决加重设定包装的优化。无法明确列举潜在的集合/群集,从而通过列生成激励优化。除了新颖的制剂之外,我们还引入了新的双重最佳不等式(DOI),我们称之为灵活的双重最佳不等式,在优化和加速列生成期间紧密较低的双变量。我们将我们的配方应用于实体解析(也称为重复记录),并在两个流行的基准数据集中实现最先进的准确性。我们的F-DOI可以扩展到其他加权包装问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号