首页> 外文会议>International Conference on Contemporary Computing >Feature selection using Markov clustering and maximum spanning tree in high dimensional data
【24h】

Feature selection using Markov clustering and maximum spanning tree in high dimensional data

机译:高维数据中使用马尔可夫聚类和最大生成树的特征选择

获取原文

摘要

Feature selection is the most important preprocessing step for classification of high dimensional data. It reduces the load of computational cost and prediction time on classification algorithm by selecting only the salient features from the data set for learning. The main challenges while applying feature selection on high dimensional data (HDD) are: handling the relevancy, redundancy and correlation between features. The proposed algorithm works with the three main steps to overcome these issues. It focuses on filtering strategy for its effectiveness in handling the data sets with large size and high dimensions. Initially to measure the relevancy of features with respect to class, fisher score is calculated for each feature independently. Next, only relevant features are passed to the clustering algorithm to check the redundancy of features. Finally the correlation between features is calculated using maximum spanning tree and the most appropriate features are filtered out. The classification accuracy of the presented approach is validated by using C4.5, IB1 and Naive Bayes classifier. The proposed algorithm gives high classification accuracy when compared against the accuracies given by three different classifiers on the datasets containing features extracted from fisher score method and dataset containing all the features or full-featured dataset.
机译:特征选择是用于对高维数据进行分类的最重要的预处理步骤。通过仅从要学习的数据集中选择显着特征,它减少了分类算法的计算成本和预测时间。将特征选择应用于高维数据(HDD)时的主要挑战是:处理特征之间的相关性,冗余性和相关性。所提出的算法通过三个主要步骤来克服这些问题。它专注于过滤策略,以有效地处理大尺寸和高尺寸的数据集。最初是为了衡量要素与类别的相关性,为每个要素独立计算出fisher分数。接下来,仅将相关特征传递给聚类算法以检查特征的冗余性。最后,使用最大生成树计算特征之间的相关性,并过滤掉最合适的特征。通过使用C4.5,IB1和朴素贝叶斯分类器验证了所提出方法的分类准确性。与三个不同分类器对包含从费舍尔得分法提取的特征的数据集和包含所有特征或功能齐全的数据集的数据集的准确性相比,该算法具有较高的分类精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号