selection of relevant feature from a given set of feature is one of the important issues in the field of data mining as well as classification. In general the dataset may contain a number of features however it is not necessary that the whole set features are important for particular analysis of decision making because the features may share the common information?s and can also be completely irrelevant to the undergoing processing. This generally happen because of improper selection of features during the dataset formation or because of improper information availability about the observed system. However in both cases the data will contain the features that will just increase the processing burden which may ultimately cause the improper outcome when used for analysis. Because of these reasons some kind of methods are required to detect and remove these features hence in this paper we are presenting an efficient approach for not just removing the unimportant features but also the size of complete dataset size. The proposed algorithm utilizes the information theory to detect the information gain from each feature and minimum span tree to group the similar features with that the fuzzy c-means clustering is used to remove the similar entries from the dataset. Finally the algorithm is tested with SVM classifier using 35 publicly available real-world high-dimensional dataset and the results shows that the presented algorithm not only reduces the feature set and data lengths but also improves the performances of the classifier.
展开▼