...
首页> 外文期刊>Annals Data Science >A Feature Selection Method Based on Ranked Vector Scores of Features for Classification
【24h】

A Feature Selection Method Based on Ranked Vector Scores of Features for Classification

机译:基于特征的排序矢量分数的特征选择方法

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

One of the major aspects of any classification process is selecting the relevant set of features to be used in a classification algorithm. This initial step in data analysis is called the feature selection process. Disposing of the irrelevant features from the dataset will reduce the complexity of the classification task and will increase the robustness of the decision rules when applied on the test set. This paper proposes a new filtering method that combines and normalizes the scores of three major feature selection methods: information gain, chi-squared statistic and inter-correlation. Our method utilizes the strengths of each of the aforementioned methods to maximum advantage while avoiding their drawbacks-especially the disparity of the results produced by these methods. Our filtering method stabilizes each variable score and gives it the true rank among the input data's available variables. Hence it maximizes the stability in the variables' scores without losing the overall accuracy of the predictive model. A number of experiments on different datasets from various domains have shown that features chosen by the proposed method are highly predictive when compared with features selected by other existing filtering methods. The evaluation of the filtering phase was conducted via thorough experimentations using a number of predictive classification algorithms in addition to statistical analysis of the filtering methods' scores.
机译:任何分类过程的主要方面之一是选择要在分类算法中使用的相关特征集。数据分析的初始步骤称为特征选择过程。从数据集中处理不相关的特征将减少分类任务的复杂性,并在应用于测试集时将增加决策规则的鲁棒性。本文提出了一种新的过滤方法,该方法将三种主要特征选择方法的得分合并并归一化:信息增益,卡方统计量和互相关。我们的方法利用了上述每种方法的优势,从而最大程度地发挥了优势,同时避免了它们的缺点-特别是这些方法所产生的结果存在差异。我们的过滤方法可稳定每个变量得分,并在输入数据的可用变量中赋予其真实排名。因此,它最大化了变量分数的稳定性,而不会损失预测模型的整体准确性。在来自不同领域的不同数据集上进行的大量实验表明,与其他现有过滤方法选择的特征相比,建议的方法选择的特征具有较高的预测性。除了对过滤方法得分进行统计分析外,还使用许多预测性分类算法通过彻底的实验对过滤阶段进行了评估。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号