首页> 外文会议>IEEE International Conference on Data Science and Advanced Analytics >An Unsupervised Attribute Clustering Algorithm for Unsupervised Feature Selection
【24h】

An Unsupervised Attribute Clustering Algorithm for Unsupervised Feature Selection

机译:无监督特征选择的无监督属性聚类算法

获取原文

摘要

The curse of dimensionality refers to the problem that one faces when analyzing datasets with thousands or hundreds of thousands of attributes. This problem is usually tackled by different feature selection methods which have been shown to effectively reduce computation time, improve prediction performance, and facilitate better understanding of datasets in various application areas. These methods can be classified into filter methods, wrapper methods and embedded methods. All of these feature selection methods require class label information to perform their tasks. Hence, when such information is unavailable, the feature selection problem can be very challenging. In order to overcome the above challenges, we propose an unsupervised feature selection method which is called Unsupervised Attribute Clustering Algorithm (UACA) involved in several steps: i) calculate the value of Maximal Information Coefficient for each pair of attributes to construct an attributes distance matrix; ii) cluster all attributes using optimal k-mode clustering method to find out k modes attributes as features of each cluster. For evaluating the performance of the proposed algorithm, classification problems with different classifiers were tested to validate the method and compare with other methods. The results of data experiments exhibit the proposed unsupervised algorithm which is comparable with classical feature selection methods and even outperforms some supervised learning algorithm.
机译:维度的诅咒是指在分析具有数千个或数十万个属性的数据集时面临的问题。该问题通常由已经显示的不同特征选择方法来解决,这些方法已经有效地减少计算时间,提高预测性能,并便于更好地理解各种应用区域中的数据集。这些方法可以分为过滤方法,包装方法和嵌入方法。所有这些功能选择方法都需要类标签信息来执行任务。因此,当这些信息不可用时,特征选择问题可能非常具有挑战性。为了克服上述挑战,我们提出了一种无监督的特征选择方法,称为若干步骤所涉及的无监督的属性聚类算法(UACA):i)计算每对属性的最大信息系数的值来构建属性距离矩阵; ii)使用最佳k模式群集方法群集所有属性,以查找k模式属性作为每个群集的功能。为了评估所提出的算法的性能,测试了不同分类器的分类问题以验证方法并与其他方法进行比较。数据实验结果表现出所提出的无监督算法,其与经典特征选择方法相当,甚至优于一些监督学习算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号