【24h】

Clustering and feature selection via PSO algorithm

机译:通过PSO算法进行聚类和特征选择

获取原文

摘要

Clustering is one of the popular techniques for data analysis. In this paper, we proposed a new method for the simultaneously clustering and feature selection through the use of the multi-objective particle swarm optimization (PSO). Since different features may have different important in various contexts; some features may be irrelevant and some of them may be misleading in clustering. Therefore, we weighted features and by using a threshold value which is automatically produced by the algorithm itself; then some of features with low weight is omitted. Evolutionary algorithms are the most famous technique for clustering. There are two main problems with clustering algorithms based on evolutionary algorithms. First, they are slow; second, they are dependent on the shape of the cluster and mostly work well with a specific dataset. To solve the first problem and increased the speed of the algorithm, we use two local searches to improve cluster centers and to estimate the threshold value. To handle the second problem, we evaluate the clustering by combine the two validation criterion methods of a new proposed KMPBM validation criterion and Conn validation criterion as a multi-objective fitness function. These two validation criterion because based on compactness and connectedness criterion can work independent of the shape of clusters. Experimental on the three Synthetics datasets and three real datasets shows that our proposed algorithm performs clustering independently for the shape of clusters and it can have good accuracy on dataset with any shape.
机译:聚类是流行的数据分析技术之一。在本文中,我们提出了一种通过使用多目标粒子群优化算法(PSO)同时进行聚类和特征选择的新方法。由于不同的功能在各种情况下可能具有不同的重要性;有些功能可能无关紧要,而有些功能可能会在群集中产生误导。因此,我们对特征加权并使用由算法本身自动产生的阈值。然后省略一些重量较轻的功能。进化算法是最著名的聚类技术。基于进化算法的聚类算法存在两个主要问题。首先,它们很慢;其次,它们取决于群集的形状,并且在特定的数据集上通常能很好地工作。为了解决第一个问题并提高算法的速度,我们使用两个局部搜索来改善聚类中心并估计阈值。为了解决第二个问题,我们通过将新提出的KMPBM验证准则和Conn验证准则这两种验证准则方法结合起来作为多目标适应度函数来评估聚类。这两个验证标准是因为基于紧密性和连通性标准可以独立于群集的形状而工作。对3个Synthetics数据集和3个实际数据集进行的实验表明,我们提出的算法针对聚类的形状独立执行聚类,并且对于任何形状的数据集都可以具有良好的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号