首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >New and efficient DCA based algorithms for minimum sum-of-squares clustering
【24h】

New and efficient DCA based algorithms for minimum sum-of-squares clustering

机译:基于新型高效DCA的最小平方和聚类算法

获取原文
获取原文并翻译 | 示例
           

摘要

The purpose of this paper is to develop new efficient approaches based on DC (Difference of Convex functions) programming and DCA (DC Algorithm) to perform clustering via minimum sum-of-squares Euclidean distance. We consider the two most widely used models for the so-called Minimum Sum-of-Squares Clustering (MSSC in short) that are a bilevel programming problem and a mixed integer program. Firstly, the mixed integer formulation of MSSC is carefully studied and is reformulated as a continuous optimization problem via a new result on exact penalty technique in DC programming. DCA is then investigated to the resulting problem. Secondly, we introduce a Gaussian kernel version of the bilevel programming formulation of MSSC, named GKMSSC. The GKMSSC problem is formulated as a DC program for which a simple and efficient DCA scheme is developed. A regularization technique is investigated for exploiting the nice effect of DC decomposition and a simple procedure for finding good starting points of DCA is developed. The proposed DCA schemes are original and very inexpensive because they amount to computing, at each iteration, the projection of points onto a simplex and/or onto a ball, and/or onto a box, which are all determined in the explicit form. Numerical results on real word datasets show the efficiency, the scalability of DCA and its great superiority with respect to k-means and kernel k-means, standard methods for clustering.
机译:本文的目的是开发基于DC(凸函数的差异)编程和DCA(DC算法)的新有效方法,以通过最小平方和欧氏距离执行聚类。我们考虑所谓的最小平方和聚类(简称MSSC)的两个最广泛使用的模型,它们是双层编程问题和混合整数程序。首先,仔细研究了MSSC的混合整数公式,并通过DC编程中精确罚分技术的新结果将其重新公式化为连续优化问题。然后对DCA进行研究以解决由此产生的问题。其次,我们介绍了MSSC双层编程公式的高斯内核版本,称为GKMSSC。 GKMSSC问题被表述为DC程序,为此为其开发了一种简单有效的DCA方案。研究了一种利用DC分解效果的正则化技术,并开发了一种简单的方法来找到DCA的良好起点。所提出的DCA方案是原始的并且非常便宜,因为它们等于在每次迭代中计算点在单形和/或在球上和/或在盒子上的投影,这些都以显式形式确定。实词数据集上的数值结果表明,DCA的效率,可扩展性及其相对于k均值和核k均值(聚类的标准方法)的巨大优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号