Clustering is a major research orientation in data mining. Considering the drawbacks of the existing clustering algorithm, a new similarity measure is proposed firstly. Then the discernibility ability of the rough set theory is used to measure the importance of attributes, and thus a weighted rough clustering algorithm based on new similarity measure is proposed. Finally,we test our algorithm versus other algorithms on the UCI datasets, and the experimental results show the proposed clustering algorithm can deal with the categorical data, and does not need to be given the number of cluster, and especially, it improves the cluster quality.%聚类是数据挖掘中重要的研究方向.本文针对现有的聚类算法中相似度量的缺陷,提出了一种新的相似性度量方法.在此基础上,将粗糙集理论中的区分能力引入到聚类算法中,用来度量属性的重要性,进而提出了一种能够处理符号型数据的新的加权粗糙聚类算法.通过对UCI数据的实验表明,本文算法对数据输入顺序不敏感,且不需要预先给定簇的数目,提高了聚类的质量.
展开▼