首页> 外文会议>Conference on Image and Signal Processing for Remote Sensing VIII, Sep 24-27, 2002, Agia Pelagia, Crete, Greece >Cluster Structure Evaluation of a Dyadic k-means Algorithm for Mining Large Image Archives
【24h】

Cluster Structure Evaluation of a Dyadic k-means Algorithm for Mining Large Image Archives

机译:挖掘大图像档案的二元k均值算法的聚类结构评估

获取原文
获取原文并翻译 | 示例

摘要

For many applications in data mining and knowledge discovery in databases, clustering methods are used for data reduction. If the amount of data increases like in image information mining, where one has to process GBytes of data, for instance, many of the existing clustering algorithms cannot be applied because of a high computational complexity. To overcome this disadvantage, we developed an efficient clustering algorithm called dyadic fc-means. The algorithm is a modified and enhanced version of the traditional fc-means. Whereas k-means has a computational complexity of O(nk) with n samples and k clusters, dyadic k-means has one of O(n log k). Our algorithm is particularly efficient for the grouping of very large data sets with a high number of clusters. In this article we will present statistically-based methods for the objective evaluation of clusters obtained by dyadic fc-means. The main focus is on how well the clusters describe the data point distribution in a multidimensional feature space and how much information can be obtained from the clusters. Both the filling of the feature space with samples and the characterization of this configuration with dyadic fc-means produced clusters will be considered. We will use the well-established scatter matrices to measure the compactness and separability of clustered groups in the feature space. The probability of error, which is another indicator for the characterization of samples in the feature space by clusters, will be calculated for each point, too. This probability delivers the relationship of each point to its cluster and can therefore be considered as a measurement of cluster reliability. We will test the evaluation methods both on a synthetic and a real world data set.
机译:对于数据挖掘和数据库中的知识发现的许多应用程序,使用聚类方法来减少数据量。例如,如果像在图像信息挖掘中那样数据量增加了,例如必须处理GB的数据,由于计算复杂度高,许多现有的聚类算法将无法应用。为克服此缺点,我们开发了一种有效的聚类算法,称为二进位fc-means。该算法是传统fc-means的修改和增强版本。 k均值的计算复杂度为O(nk),包含n个样本和k个簇,而二进位k均值的计算复杂度为O(n log k)。对于具有大量簇的超大型数据集的分组,我们的算法特别有效。在本文中,我们将介绍基于统计的方法,对通过二进fc-means获得的聚类进行客观评估。主要关注点是群集如何良好地描述多维特征空间中的数据点分布以及可以从群集中获取多少信息。既要考虑用样本填充特征空间,又要考虑用二进方fc-means产生的簇对这种配置进行表征。我们将使用公认的散布矩阵来测量特征空间中聚类组的紧致性和可分离性。错误概率是聚类特征空间中样本表征的另一个指标,也将为每个点计算错误概率。该概率传递了每个点与其群集的关系,因此可以视为群集可靠性的度量。我们将在综合和真实数据集上测试评估方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号