首页> 外文期刊>IEICE transactions on information and systems >A Support Vector and K-Means Based Hybrid Intelligent Data Clustering Algorithm
【24h】

A Support Vector and K-Means Based Hybrid Intelligent Data Clustering Algorithm

机译:基于支持向量和K-Means的混合智能数据聚类算法

获取原文
       

摘要

Support vector clustering (SVC), a recently developed unsupervised learning algorithm, has been successfully applied to solving many real-life data clustering problems. However, its effectiveness and advantages deteriorate when it is applied to solving complex real-world problems, e.g., those with large proportion of noise data points and with connecting clusters. This paper proposes a support vector and K-Means based hybrid algorithm to improve the performance of SVC. A new SVC training method is developed based on analysis of a Gaussian kernel radius function. An empirical study is conducted to guide better selection of the standard deviation of the Gaussian kernel. In the proposed algorithm, firstly, the outliers which increase problem complexity are identified and removed by training a global SVC. The refined data set is then clustered by a kernel-based K-Means algorithm. Finally, several local SVCs are trained for the clusters and then each removed data point is labeled according to the distance from it to the local SVCs. Since it exploits the advantages of both SVC and K-Means, the proposed algorithm is capable of clustering compact and arbitrary organized data sets and of increasing robustness to outliers and connecting clusters. Experiments are conducted on 2-D data sets generated by mixture models and benchmark data sets taken from the UCI machine learning repository. The cluster error rate is lower than 3.0% for all the selected data sets. The results demonstrate that the proposed algorithm compared favorably with existing SVC algorithms.
机译:支持向量聚类(SVC)是最近开发的一种无监督学习算法,已成功应用于解决许多现实生活中的数据聚类问题。但是,当将其应用于解决复杂的现实世界问题时,例如,具有大量噪声数据点且具有连接簇的问题,其有效性和优势会降低。提出了一种基于支持向量和K-Means的混合算法来提高SVC的性能。基于对高斯核半径函数的分析,开发了一种新的SVC训练方法。进行了一项经验研究,以指导更好地选择高斯核的标准偏差。在提出的算法中,首先,通过训练全局SVC来识别和消除增加问题复杂性的离群值。然后,通过基于内核的K-Means算法对精炼的数据集进行聚类。最后,为集群训练了几个本地SVC,然后根据每个数据点到本地SVC的距离来标记每个删除的数据点。由于它利用了SVC和K-Means的优点,因此该算法能够对紧凑且任意组织的数据集进行聚类,并能够提高对异常值和连接聚类的鲁棒性。对混合模型生成的二维数据集和从UCI机器学习存储库中获取的基准数据集进行了实验。对于所有选定的数据集,群集错误率均低于3.0%。结果表明,该算法与现有的SVC算法相比具有优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号