首页> 外国专利> Method for refining the initial conditions for clustering with applications to small and large database clustering

Method for refining the initial conditions for clustering with applications to small and large database clustering

机译:将应用程序集群的初始条件细化为小型和大型数据库集群的方法

摘要

As an optimization problem, clustering data (unsupervised learning) is known to be a difficult problem. Most practical approaches use a heuristic, typically gradient-descent, algorithm to search for a solution in the huge space of possible solutions. Such methods are by definition sensitive to starting points. It has been well-known that clustering algorithms are extremely sensitive to initial conditions. Most methods for guessing an initial solution simply make random guesses. In this paper we present a method that takes an initial condition and efficiently produces a refined starting condition. The method is applicable to a wide class of clustering algorithms for discrete and continuous data. In this paper we demonstrate how this method is applied to the popular K-means clustering algorithm and show that refined initial starting points indeed lead to improved solutions. The technique can be used as an initializer for other clustering solutions. The method is based on an efficient technique for estimating the modes of a distribution and runs in time guaranteed to be less than overall clustering time for large data sets. The method is also scalable and hence can be efficiently used on huge databases to refine starting points for scalable clustering algorithms in data mining applications.
机译:作为优化问题,聚类数据(无监督学习)是一个难题。最实用的方法是使用启发式算法(通常是梯度下降算法)在可能的解决方案的巨大空间中搜索解决方案。根据定义,此类方法对起点敏感。众所周知,聚类算法对初始条件极为敏感。猜测初始解决方案的大多数方法只是进行随机猜测。在本文中,我们提出了一种采用初始条件并有效产生精确的启动条件的方法。该方法适用于各种离散和连续数据的聚类算法。在本文中,我们演示了如何将该方法应用于流行的K-means聚类算法,并表明改进的初始起点确实可以带来改进的解决方案。该技术可以用作其他群集解决方案的初始化程序。该方法基于一种有效的技术来估计分布的模式,并且运行时间保证小于大型数据集的总体聚类时间。该方法也是可伸缩的,因此可以在大型数据库上有效使用,以细化数据挖掘应用程序中可伸缩群集算法的起点。

著录项

  • 公开/公告号US6115708A

    专利类型

  • 公开/公告日2000-09-05

    原文格式PDF

  • 申请/专利权人 MICROSOFT CORPORATION;

    申请/专利号US19980034834

  • 发明设计人 PAUL S. BRADLEY;USAMA FAYYAD;

    申请日1998-03-04

  • 分类号G06F17/30;

  • 国家 US

  • 入库时间 2022-08-22 01:36:12

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号