...
首页> 外文期刊>Soft computing: A fusion of foundations, methodologies and applications >Gravitational search algorithm and K-means for simultaneous feature selection and data clustering: a multi-objective approach
【24h】

Gravitational search algorithm and K-means for simultaneous feature selection and data clustering: a multi-objective approach

机译:引力搜索算法和K-means用于同时特征选择和数据聚类:多目标方法

获取原文
获取原文并翻译 | 示例
           

摘要

Clustering is an unsupervised classification method used to group the objects of an unlabeled data set. The high dimensional data sets generally comprise of irrelevant and redundant features also along with the relevant features which deteriorate the clustering result. Therefore, feature selection is necessary to select a subset of relevant features as it improves discrimination ability of the original set of features which helps in improving the clustering result. Though many metaheuristics have been suggested to select subset of the relevant features in wrapper framework based on some criteria, most of them are marred by the three key issues. First, they require objects class information a priori which is unknown in unsupervised feature selection. Second, feature subset selection is devised on a single validity measure; hence, it produces a single best solution biased toward the cardinality of the feature subset. Third, they find difficulty in avoiding local optima owing to lack of balancing in exploration and exploitation in the feature search space. To deal with the first issue, we use unsupervised feature selection method where no class information is required. To address the second issue, we follow pareto-based approach to obtain diverse trade-off solutions by optimizing conceptually contradicting validity measures silhouette index (Sil) and feature cardinality (d). For the third issue, we introduce genetic crossover operator to improve diversity in a recent Newtonian law of gravity-based metaheuristic binary gravitational search algorithm (BGSA) in multi-objective optimization scenario; it is named as improved multi-objective BGSA for feature selection (IMBGSAFS). We use ten real-world data sets for comparison of the IMBGSAFS results with three multi-objective methods MBGSA, MOPSO, and NSGA-II in wrapper framework and the Pearson's linear correlation coefficient (FM-CC) as a multi-objective filter method. We employ four multi-objective quality measures conver
机译:群集是一种无监督的分类方法,用于对未标记的数据集的对象进行分组。高维数据集通常包括无关和冗余特征以及与劣化聚类结果的相关特征相同。因此,需要选择要选择相关特征的子集,因为它提高了原始特征集的辨别能力,这有助于提高群集结果。虽然已经建议根据一些标准选择许多美术学,以便根据一些标准选择包装框架中的相关功能的子集,其中大多数是三个关键问题的损害。首先,它们需要对象类信息在无监督功能选择中未知的先验。其次,特征子集选择由单个有效度测量设计;因此,它产生朝向特征子集的基数偏置的单一最佳解决方案。第三,由于在特征搜索空间中缺乏平衡勘探和开发缺乏平衡,它们难以避免本地最佳。要处理第一个问题,我们使用无监督的功能选择方法,其中不需要类信息。为了解决第二个问题,我们遵循基于帕累托的方法,通过优化概念上矛盾的有效性测量剪影指数(SIL)和特征基数(D)来获得各种折衷解决方案。对于第三个问题,我们介绍了遗传交叉运营商,以改善多目标优化场景中最近的基于重力的成群质主义二元重力搜索算法(BGSA)的多样性;它被命名为特征选择(IMBGSAF)的改进的多目标BGSA。我们使用十个现实数据集进行了三种多目标方法MBGSA,MOPSO和NSGA-II的IMBGSAF的结果,以及Pearson的线性相关系数(FM-CC)作为多目标滤波方法。我们采用四项多目标质量措施转换

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号