首页> 外文期刊>Algorithmica >A Unified Framework for Clustering Constrained Data Without Locality Property
【24h】

A Unified Framework for Clustering Constrained Data Without Locality Property

机译:统一的群框架,用于群集无限制的受限数据

获取原文
获取原文并翻译 | 示例
           

摘要

Abstract In this paper, we consider a class of constrained clustering problems of points in Rddocumentclass[12pt]{minimal} usepackage{amsmath} usepackage{wasysym} usepackage{amsfonts} usepackage{amssymb} usepackage{amsbsy} usepackage{mathrsfs} usepackage{upgreek} setlength{oddsidemargin}{-69pt} egin{document}$$mathbb {R}^{d}$$end{document}, where d could be rather high. A common feature of these problems is that their optimal clusterings no longer have the locality property (due to the additional constraints), which is a key property required by many algorithms for their unconstrained counterparts. To overcome the difficulty caused by the loss of locality, we present in this paper a unified framework, called Peeling-and-Enclosing, to iteratively solve two variants of the constrained clustering problems, constrained k-means clustering (k-CMeans) and constrained k-median clustering (k-CMedian). Our framework generalizes Kumar et al.’s (J ACM 57(2):5, 2010) elegant k-means clustering approach from unconstrained data to constrained data, and is based on two standalone geometric techniques, called Simplex Lemma and Weaker Simplex Lemma, for k-CMeans and k-CMedian, respectively. The simplex lemma (or weaker simplex lemma) enables us to efficiently approximate the mean (or median) point of an unknown set of points by searching a small-size grid, independent of the dimensionality of the space, in a simplex (or the surrounding region of a simplex), and thus can be used to handle high dimensional data. If k and 1ϵdocumentclass[12pt]{minimal} usepackage{amsmath} usepackage{wasysym} usepackage{amsfonts} usepackage{amssymb} usepackage{amsbsy} usepackage{mathrsfs} usepackage{upgreek} setlength{oddsidemargin}{-69pt} egin{document}$$rac{1}{epsilon }$$end{document} are fixed numbers, our framework generates, in nearly linear time (i.e., O(n(logn)k+1d)documentclass[12pt]{minimal} usepackage{amsmath} usepackage{wasysym} usepackage{amsfonts} usepackage{amssymb} usepackage{amsbsy} usepackage{mathrsfs} usepackage{upgreek} setlength{oddsidemargin}{-69pt} egin{document}$$O(n(log n)^{k+1}d)$$end{document}), O((logn)k)documentclass[12pt]{minimal} usepackage{amsmath} usepackage{wasysym} usepackage{amsfonts} usepackage{amssymb} usepackage{amsbsy} usepackage{mathrsfs} usepackage{upgreek} setlength{oddsidemargin}{-69pt} egin{document}$$O((log n)^{k})$$end{document}k-tuple candidates for the k mean or median points, and one of them induces a (1+ϵ)documentclass[12pt]{minimal} usepackage{amsmath} usepackage{wasysym} usepackage{amsfonts} usepackage{amssymb} usepackage{amsbsy} usepackage{mathrsfs} usepackage{upgreek} setlength{oddsidemargin}{-69pt} egin{document}$$(1+epsilon )$$end{document}-approximation for k-CMeans or k-CMedian, where n is the number of points. Combining this unified framework with a problem-specific selection algorithm (which determines the best k-tuple candidate), we obtain a (1+ϵ)documentclass[12pt]{minimal} usepackage{amsmath} usepackage{wasysym} usepackage{amsfonts} usepackage{amssymb} usepackage{amsbsy} usepackage{mathrsfs} usepackage{upgreek} setlength{oddsidemargin}{-69pt} egin{document}$$(1+epsilon )$$end{document}-approximation for each of the constrained clustering problems. Our framework improves considerably the best known results for these problems. We expect that our technique will be applicable to other variants of k-means and k-median clustering problems without locality.
机译:摘要在本文中,我们考虑了一类rd documentClass [12pt] {minimal} usepackage {ammath} usepackage {keysym} usepackage {amssymb} usepackage {amssyb} usepackage {amssyb} usepackage {mathrsfs} usepackage {supmeez} setLength { oddsidemargin} { - 69pt} begin {document} $$$ mathbb {r} ^ {d} $$ ne $$ need {document},其中d可能相当高。这些问题的一个共同特征是它们的最佳群集不再具有局部性属性(由于附加约束而导致的),这是许多算法所需的关键属性,用于其无约束对方。为了克服所在地丧失造成的困难,我们在本文中展示了一个统一的框架,称为剥离和封闭,以迭代地解决约束聚类问题的两个变体,约束k-merse聚类(k-cmeans)和约束K-Median集群(K-Cmedian)。我们的框架推出了Kumar等人。的(J ACM 57(2):5,1010)优雅的K-means聚类方法从无约束数据到受限数据,基于两个独立的几何技术,称为单纯x引理和弱Simplex Lemma ,对于K-cmeans和K-cmedian分别。单纯x的引理(或弱Simplex Lemma)使我们能够通过搜索小尺寸网格,独立于Simplex(或周围的空间的维度,有效地近似于未知点的平均值(或中位数)点。 Simplex的区域),因此可用于处理高维数据。如果k和1ε documentClass [12pt] {minimal} usepackage {ammath} usepackage {isysym} usepackage {amsfonts} usepackage {amssymb} usepackage {amsbsy} usepackage {mathrsfs} usepackage {supmeek} setLength { ODDSIDEMARGIN} { - 69pt} begin {document} $$$ frac {1} { epsilon} $$ end {document}是固定数字,我们的框架在几乎线性时间(即o(n(logn) )k + 1d) documentClass [12pt] {minimal} usepackage {ammath} usepackage {isysym} usepackage {amsfonts} usepackage {amssys} usepackage {amsbsy} usepackage {mathrsfs} usepackage {supmeek} setLength { oddsidemargin} { - 69pt} begin {document} $$ o(n( log n)^ {k + 1} d)$$ end {document}),o((logn)k) documentclass [ 12pt] {minimal} usepackage {ammath} usepackage {isysym} usepackage {amsfonts} usepackage {amssymb} usepackage {amsbsy} usepackage {mathrsfs} usepackage {supmeek} setLength { oddsidemargin} { - 69pt} begin {document} $$ o(( log n)^ {k})$$ end {document} k-cuple候选k均值或中位数,其中一个引起了一个(1 +ε) DocumentClass. [12pt] {minimal} usepackage {ammath} usepackage {isysym} usepackage {amsfonts} usepackage {amssymb} usepackage {amsbsy} usepackage {mathrsfs} usepackage {supmeek} setLength { oddsidemargin} { - 69pt } begin {document} $$(1+ epsilon)$$ end {document} - 对于k-cmeans或k-cmedian的批准,其中n是点数。将此统一的框架与特定于问题的选择算法(确定最佳的K-Tuple候选)组合,获取(1 +ε) documentClass [12pt] {minimal} usepackage {ammath} usepackage {isysym} usepackage {amsfonts} usepackage {amssymb} usepackage {amsbsy} usepackage {mathrsfs} usepackage {supmeek} setLength { oddsideDemargin} { - 69pt} begin {document} $$(1+ epsilon)$$ End {文档} - 每个约束群集问题的批准。我们的框架可以提高这些问题的最佳结果。我们预计我们的技术将适用于k-means的其他变体和没有地方的k-median集群问题。

著录项

  • 来源
    《Algorithmica》 |2020年第4期|808-852|共45页
  • 作者

    Hu Ding; Jinhui Xu;

  • 作者单位

    School of Computer Science and Technology University of Science and Technology of China Hefei China;

    Department of Computer Science and Engineering State University of New York at Buffalo Buffalo USA;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Constrained clustering; k-means/median; Approximation algorithms; High dimensions;

    机译:受限聚类;K-means /中值;近似算法;高尺寸;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号