首页> 外文期刊>International Journal of Engineering Research and Applications >A Combined Approach for Feature Subset Selection and Size Reduction for High Dimensional Data
【24h】

A Combined Approach for Feature Subset Selection and Size Reduction for High Dimensional Data

机译:高维数据特征子集选择和尺寸缩减的组合方法

获取原文
           

摘要

selection of relevant feature from a given set of feature is one of the important issues in the field of data mining as well as classification. In general the dataset may contain a number of features however it is not necessary that the whole set features are important for particular analysis of decision making because the features may share the common information?s and can also be completely irrelevant to the undergoing processing. This generally happen because of improper selection of features during the dataset formation or because of improper information availability about the observed system. However in both cases the data will contain the features that will just increase the processing burden which may ultimately cause the improper outcome when used for analysis. Because of these reasons some kind of methods are required to detect and remove these features hence in this paper we are presenting an efficient approach for not just removing the unimportant features but also the size of complete dataset size. The proposed algorithm utilizes the information theory to detect the information gain from each feature and minimum span tree to group the similar features with that the fuzzy c-means clustering is used to remove the similar entries from the dataset. Finally the algorithm is tested with SVM classifier using 35 publicly available real-world high-dimensional dataset and the results shows that the presented algorithm not only reduces the feature set and data lengths but also improves the performances of the classifier.
机译:从给定的特征集中选择相关特征是数据挖掘以及分类领域中的重要问题之一。通常,数据集可以包含许多特征,但是对于特定的决策分析,整个特征集不一定是重要的,因为这些特征可以共享公共信息,并且也可以与正在进行的处理完全无关。这通常是由于在数据集形成过程中对特征的选择不当或由于有关被观察系统的信息可用性不当而引起的。但是,在这两种情况下,数据都将包含一些特征,这些特征只会增加处理负担,而在用于分析时,这些负担最终可能导致不合适的结果。由于这些原因,需要使用某种方法来检测和去除这些特征,因此在本文中,我们提出一种有效的方法,不仅去除不重要的特征,而且去除整个数据集大小的大小。所提出的算法利用信息论从每个特征和最小生成树中检测出信息增益,以对相似特征进行分组,并使用模糊c均值聚类从数据集中移除相似项。最后,利用支持向量机分类器对35个公开的现实世界高维数据集进行了测试,结果表明该算法不仅减少了特征集和数据长度,而且提高了分类器的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号