首页> 外文OA文献 >Bounded Size Graph Clustering with Applications to Stream Processing
【2h】

Bounded Size Graph Clustering with Applications to Stream Processing

机译:有界大小图聚类及其在流处理中的应用

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

We introduce a graph clustering problem motivated by a stream processing application. Input to our problem is an undirected graph with vertex and edge weights. A cluster is a subset of the vertices. The {em size} of a cluster is defined as the total vertex weight in the subset plus the total edge weight at the boundary of the cluster. The bounded size graph clustering problem ($GC$) is to partition the vertices into clusters of size at most a given budget and minimize the total edge-weight across the clusters. In the {em multiway cut} version of the problem, we are also given a subset of vertices called {em terminals}. No cluster is allowed to contain more than one terminal. Our problem differs from most of the previously studied clustering problems in that the number of clusters is not specified. We first show that the feasibility version of the multiway cut $GC$ problem, i.e., determining if there exists a clustering with bounded-size clusters satisfying the multiway cut constraint, can be solved in polynomial time. Our algorithm is based on the min-cut subroutine and an uncrossing argument. This result is in contrast with the NP-hardness of the min-max multiway cut problem, considered by Svitkina and Tardos (2004), in which the number of clusters must equal the number of terminals. Our results for the feasibility version also generalize to any symmetric submodular function. We next show that the optimization version of $GC$ is NP-hard by showing an approximation-preserving reduction from the $frac 13$-balanced cut problem. Our main result is an $O(log^2 n)$-approximation to the optimization version of the multiway cut $GC$ problem violating the budget by an $O(log n)$ factor, where $n$ denotes the number of vertices. Our algorithm is based on a set-cover-like greedy approach which iteratively computes bounded-size clusters to maximize the number of new vertices covered.
机译:我们介绍了由流处理应用程序引起的图聚类问题。输入到我们的问题的是具有顶点和边缘权重的无向图。簇是顶点的子集。群集的{em size}定义为子集中的总顶点权重加上群集边界处的总边缘权重。有界尺寸图聚类问题($ GC $)是将顶点划分为大小最大为给定预算的聚类,并最小化聚类中的总边权重。在问题的{em多向切割}版本中,我们还得到了称为{em terminal}的一组顶点。不允许任何群集包含多个终端。我们的问题与大多数先前研究的聚类问题不同,因为未指定聚类的数量。我们首先表明,可以在多项式时间内求解多径切割$ GC $问题的可行性版本,即确定是否存在具有满足多径切割约束的有界簇的聚类。我们的算法基于min-cut子例程和一个非交叉参数。该结果与Svitkina和Tardos(2004)考虑的最小-最大多路切割问题的NP硬度相反,在该问题中,簇数必须等于终端数。我们针对可行性版本的结果也可以推广到任何对称的次模函数。接下来,我们通过显示$ frac 13 $平衡割问题的近似保全减少,来证明$ GC $的优化版本是NP-hard。我们的主要结果是对优化版本的多路削减$ GC $问题的优化版本获得了$ O(log ^ 2 n)$近似值,这违反了预算,并超出了$ O(log n)$因素,其中$ n $表示顶点。我们的算法基于类似集合覆盖的贪婪方法,该方法迭代地计算有界大小的簇,以最大程度地覆盖新顶点的数量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号