首页> 外文会议>Uncertainty in Artificial Intelligence >Optimal Time Bounds for Approximate Clustering

【24h】

Optimal Time Bounds for Approximate Clustering

机译：近似聚类的最佳时间范围

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clustering is a fundamental problem in unsuper-vised learning, and has been studied widely both as a problem of learning mixture models and as an optimization problem. In this paper, we study clustering with respect the k-median objective function, a natural formulation of clustering in which we attempt to minimize the average distance to cluster centers. One of the main contributions of this paper is a simple but powerful sampling technique that we call successive sampling that could be of independent interest. We show that our sampling procedure can rapidly identify a small set of points (of size just O(klogn/k)) that summarize the input points for the purpose of clustering. Using successive sampling, we develop an algorithm for the k-median problem that runs in O(nk) time for a wide range of values of k and is guaranteed, with high probability, to return a solution with cost at most a constant factor times optimal. We also establish a lower bound of Ω(nk) on any randomized constant-factor approximation algorithm for the k-median problem that succeeds with even a negligible (say1/100) probability. The best previous upper bound for the problem was O(nk), where the O-notation hides polylogarithmic factors in n and k. The best previous lower bound of Ω(nk) applied only to deterministic k-median algorithms. While we focus our presentation on the k-median objective, all our upper bounds are valid for the k-means objective as well. In this context our algorithm compares favorably to the widely used k-means heuristic, which requires O(nk) time for just one iteration and provides no useful approximation guarantees.

机译：聚类是无监督学习中的一个基本问题，并且已被广泛研究为学习混合模型的问题和优化问题。在本文中，我们研究了关于k中值目标函数的聚类，这是聚类的一种自然表述，其中我们试图使到聚类中心的平均距离最小。本文的主要贡献之一是一种简单但功能强大的采样技术，我们称之为可能具有独立利益的连续采样。我们表明，我们的采样过程可以快速识别出少量点（大小仅为O（klogn / k）），这些点汇总了用于聚类目的的输入点。使用连续采样，我们针对k中值问题开发了一种算法，该算法在O（nk）时间内针对k的较大范围值运行，并有可能以最大的概率返回成本最大为常数倍的解决方案最佳的。我们还针对k中位数问题的任何随机常数因数近似算法建立了Ω（nk）的下限，该概率即使以可忽略的（say / 100）概率也能成功。该问题的最佳先前上限是O（nk），其中O标记将n和k中的对数因子隐藏起来。 Ω（nk）的最佳先前下界仅适用于确定性k中值算法。虽然我们将介绍的重点放在k均值目标上，但我们的所有上限对k均值目标也同样有效。在这种情况下，我们的算法可与广泛使用的k均值启发式算法相提并论，后者仅需要O（nk）次迭代时间，就无法提供有用的近似保证。

著录项

来源
《Uncertainty in Artificial Intelligence 》|2002年|p.344-351|共8页
会议地点
作者
Ramgopal R. Mettu; C. Greg Plaxton;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动化技术、计算机技术 ;
关键词

相似文献

外文文献
中文文献
专利

1. Optimal Time Bounds for Approximate Clustering [J] . RAMGOPAL R. METTU, C. GREG PLAXTON Machine Learning . 2004 ,第1a3期

机译：近似聚类的最佳时间范围
2. Real-Time Radiation Treatment Planning with Optimality Guarantees via Cluster and Bound Methods [J] . Ungun Baris, Xing Lei, Boyd Stephen INFORMS journal on computing . 2019 ,第3期

机译：通过簇和约束方法保证最优性的实时放射治疗计划
3. Real-Time Radiation Treatment Planning with Optimality Guarantees via Cluster and Bound Methods [J] . Ungun Baris, Xing Lei, Boyd Stephen INFORMS journal on computing . 2019 ,第3期

机译：实时辐射处理规划，通过集群和绑定方法提供最优性保证
4. Optimal Time Bounds for Approximate Clustering [C] . Ramgopal R. Mettu, C. Greg Plaxton Conference on uncertainth in artificial intelligence . 2002

机译：近似聚类的最佳时间界限
5. Approximate dynamic programming based solutions for fixed-final-time optimal control and optimal switching. [D] . Heydari, Ali. 2013

机译：基于近似动态编程的解决方案，用于固定最终时间的最佳控制和最佳切换。
6. A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series [O] . Sara C Madeira, Arlindo L Oliveira 2009

机译：用于寻找基因表达时间序列中近似表达模式的多项式时间双簇算法
7. Optimal time bounds for approximate clustering [O] . Ramgopal R. Mettu, C. Greg Plaxton 2002

机译：近似聚类的最佳时间范围

Optimal Time Bounds for Approximate Clustering

摘要

著录项

相似文献

相关主题

期刊订阅