Cluster Structure Evaluation of a Dyadic k-means Algorithm for Mining Large Image Archives

机译：挖掘大图像档案的二元k均值算法的聚类结构评估

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

For many applications in data mining and knowledge discovery in databases, clustering methods are used for data reduction. If the amount of data increases like in image information mining, where one has to process GBytes of data, for instance, many of the existing clustering algorithms cannot be applied because of a high computational complexity. To overcome this disadvantage, we developed an efficient clustering algorithm called dyadic fc-means. The algorithm is a modified and enhanced version of the traditional fc-means. Whereas k-means has a computational complexity of O(nk) with n samples and k clusters, dyadic k-means has one of O(n log k). Our algorithm is particularly efficient for the grouping of very large data sets with a high number of clusters. In this article we will present statistically-based methods for the objective evaluation of clusters obtained by dyadic fc-means. The main focus is on how well the clusters describe the data point distribution in a multidimensional feature space and how much information can be obtained from the clusters. Both the filling of the feature space with samples and the characterization of this configuration with dyadic fc-means produced clusters will be considered. We will use the well-established scatter matrices to measure the compactness and separability of clustered groups in the feature space. The probability of error, which is another indicator for the characterization of samples in the feature space by clusters, will be calculated for each point, too. This probability delivers the relationship of each point to its cluster and can therefore be considered as a measurement of cluster reliability. We will test the evaluation methods both on a synthetic and a real world data set.

机译：对于数据挖掘和数据库中的知识发现的许多应用程序，使用聚类方法来减少数据量。例如，如果像在图像信息挖掘中那样数据量增加了，例如必须处理GB的数据，由于计算复杂度高，许多现有的聚类算法将无法应用。为克服此缺点，我们开发了一种有效的聚类算法，称为二进位fc-means。该算法是传统fc-means的修改和增强版本。 k均值的计算复杂度为O（nk），包含n个样本和k个簇，而二进位k均值的计算复杂度为O（n log k）。对于具有大量簇的超大型数据集的分组，我们的算法特别有效。在本文中，我们将介绍基于统计的方法，对通过二进fc-means获得的聚类进行客观评估。主要关注点是群集如何良好地描述多维特征空间中的数据点分布以及可以从群集中获取多少信息。既要考虑用样本填充特征空间，又要考虑用二进方fc-means产生的簇对这种配置进行表征。我们将使用公认的散布矩阵来测量特征空间中聚类组的紧致性和可分离性。错误概率是聚类特征空间中样本表征的另一个指标，也将为每个点计算错误概率。该概率传递了每个点与其群集的关系，因此可以视为群集可靠性的度量。我们将在综合和真实数据集上测试评估方法。

著录项

来源
《Conference on Image and Signal Processing for Remote Sensing VIII, Sep 24-27, 2002, Agia Pelagia, Crete, Greece》|2002年|p.120-130|共11页
会议地点 Agia Pelagia(GR)
作者
Herbert Daschiel; Mihai Datcu;
展开▼
作者单位

German Aerospace Center DLR, Remote Sensing Technology Institute -IMF Oberpfaffenhofen, D-82234 Wessling, Germany;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类无线电电子学、电信技术;
关键词
clustering; un-supervised classification; evaluation;

机译：聚类;无监督分类;评估;

相似文献

外文文献
中文文献
专利

1. Evaluation Of Fuzzy K-Means And K-Means Clustering Algorithms In Intrusion Detection Systems [J] . Farhad Soleimanian Gharehchopogh, Neda Jabbari, Zeinab Ghaffari Azar International Journal of Scientific & Technology Research . 2012,第11期

机译：入侵检测系统中模糊K-均值和K-均值聚类算法的评估
2. CBMIR: SHAPE-BASED IMAGE RETRIEVAL USING CANNY EDGE DETECTION AND K-MEANS CLUSTERING ALGORITHMS FOR MEDICAL IMAGES [J] . B.Ramamurthy, K.R.Chandran International Journal of Engineering Science and Technology . 2011,第3期

机译：CBMIR：基于形状的图像检索，使用Canny Edge检测和K-Meansic算法用于医学图像
3. An Improved Clustering Algorithm for Text Mining: Multi-Cluster Spherical K-Means [J] . Tunali Volkan, Bilgin Turgay, Camurcu Ah The international arab journal of information technology . 2016,第1期

机译：一种改进的文本挖掘聚类算法：多簇球形K-均值
4. Cluster Structure Evaluation of a Dyadic k-means Algorithm for Mining Large Image Archives [C] . Herbert Daschiel, Mihai Datcu Conference on image and signal processing for remote sensing . 2003

机译：多达K型算法挖掘大图像档案的群体结构评价
5. Efficient genetic k-means clustering algorithm and its application to data mining on different domains. [D] . Alsayat, Ahmed Mosa. 2016

机译：高效的遗传k均值聚类算法及其在不同领域数据挖掘中的应用。
6. Evaluating performance of health care facilities at meeting HIV-indicator reporting requirements in Kenya: an application of K-means clustering algorithm [O] . Milka Bochere Gesicho, Martin Chieng Were, Ankica Babic 2021

机译：在肯尼亚达到艾滋病病毒指标报告要求时评估医疗设施的表现：K-Means聚类算法的应用
7. An Overview of Expectation Maximization and K-Means family Clustering Algorithms in Data Mining Applications [O] . 2018

机译：数据挖掘应用中期望最大化和k均值家庭聚类算法的概述
8. Contiguity-enhanced k-means clustering algorithm for unsupervised multispectral image segmentation [R] . Theiler, J. , Gisler, G. 1997

机译：用于无监督多光谱图像分割的邻接增强k均值聚类算法

Cluster Structure Evaluation of a Dyadic k-means Algorithm for Mining Large Image Archives

摘要

著录项

相似文献

相关主题

期刊订阅