...
首页> 外文期刊>Mathematical Programming >Guaranteed clustering and biclustering via semidefinite programming
【24h】

Guaranteed clustering and biclustering via semidefinite programming

机译:通过半定编程保证聚类和双聚类

获取原文
获取原文并翻译 | 示例
           

摘要

Identifying clusters of similar objects in data plays a significant role in a wide range of applications. As a model problem for clustering, we consider the densest k-disjoint-clique problem, whose goal is to identify the collection of k disjoint cliques of a given weighted complete graph maximizing the sum of the densities of the complete subgraphs induced by these cliques. In this paper, we establish conditions ensuring exact recovery of the densest k cliques of a given graph from the optimal solution of a particular semidefinite program. In particular, the semidefinite relaxation is exact for input graphs corresponding to data consisting of k large, distinct clusters and a smaller number of outliers. This approach also yields a semidefinite relaxation with similar recovery guarantees for the biclustering problem. Given a set of objects and a set of features exhibited by these objects, biclustering seeks to simultaneously group the objects and features according to their expression levels. This problem may be posed as that of partitioning the nodes of a weighted bipartite complete graph such that the sumof the densities of the resulting bipartite complete subgraphs is maximized. As in our analysis of the densest k-disjoint-clique problem, we show that the correct partition of the objects and features can be recovered from the optimal solution of a semidefinite program in the case that the given data consists of several disjoint sets of objects exhibiting similar features. Empirical evidence from numerical experiments supporting these theoretical guarantees is also provided.
机译:识别数据中相似对象的簇在广泛的应用中起着重要的作用。作为聚类的模型问题,我们考虑最密集的k-不相交-clique问题,其目标是确定给定加权完整图的k个不相交的派系的集合,以最大化这些派系所引起的完整子图的密度之和。在本文中,我们建立了确保从特定半确定程序的最佳解中准确恢复给定图的最密集k团的条件。特别地,半确定松弛对于与包含k个巨大的不同簇和较少数量的异常值的数据相对应的输入图而言是精确的。这种方法还产生了半确定的松弛,并为双簇问题提供了类似的恢复保证。给定一组对象和这些对象展示的一组特征,双聚类寻求根据对象和特征的表达水平同时对它们进行分组。可能会出现这样的问题,即对加权二部完全图的节点进行划分,以使生成的二部完全子图的密度之和最大。正如我们对最密集的k-disjoint-clique问题的分析一样,我们表明,在给定数据包含多个不相交的对象集的情况下,可以从半定程序的最优解中恢复对象和特征的正确分区表现出相似的特征。还提供了来自支持这些理论保证的数值实验的经验证据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号