首页> 外文会议>International conference on discovery science >Mining a Maximum Weighted Set of Disjoint Submatrices
【24h】

Mining a Maximum Weighted Set of Disjoint Submatrices

机译:挖掘最大加权不相交子矩阵集

获取原文

摘要

The objective of the maximum weighted set of disjoint sub-matrices problem is to discover K disjoint submatrices that together cover the largest sum of entries of an input matrix. It has many practical data-mining applications, as the related biclustering problem, such as gene module discovery in bioinformatics. It differs from the maximum-weighted submatrix coverage problem introduced in [6] by the explicit formulation of disjunction constraints: submatrices must not overlap. In other words, all matrix entries must be covered by at most one submatrix. The particular case of K = 1, called the maximal-sum submatrix problem, was successfully tackled with constraint programming in [5]. Unfortunately, the case of K > 1 is more challenging to solve as the selection of rows cannot be decided in polynomial time solely from the selection of K sets of columns. It can be proved to be NP-hard. We introduce a hybrid column generation approach using constraint programming to generate columns. It is compared to a standard mixed integer linear programming (MILP) through experiments on synthetic datasets. Overall, fast and valuable solutions are found by column generation while the MILP approach cannot handle a large number of variables and constraints.
机译:不相交子矩阵最大加权集问题的目的是发现K个不相交子矩阵,它们一起覆盖了输入矩阵的所有条目的最大和。它具有许多实用的数据挖掘应用程序,例如相关的双重集群问题,例如生物信息学中的基因模块发现。它与[6]中引入的最大加权子矩阵覆盖问题的区别在于显式提出了析取约束:子矩阵不得重叠。换句话说,所有矩阵条目都必须被一个子矩阵最多覆盖。 K = 1的特殊情况称为最大和子矩阵问题,已在[5]中通过约束编程成功解决。不幸的是,要解决K> 1的情况更具挑战性,因为不能仅通过选择K组列来确定多项式时间内的行选择。可以证明它是NP难的。我们介绍一种使用约束编程生成列的混合列生成方法。通过对合成数据集进行实验,将其与标准混合整数线性规划(MILP)进行了比较。总体而言,通过列生成可以找到快速而有价值的解决方案,而MILP方法无法处理大量的变量和约束。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号