...
首页> 外文期刊>Journal of supercomputing >A new improved filter-based feature selection model for high-dimensional data
【24h】

A new improved filter-based feature selection model for high-dimensional data

机译:用于高维数据的新改进的基于滤波器的特征选择模型

获取原文
获取原文并翻译 | 示例
           

摘要

Preprocessing of data is ubiquitous, and choosing significant attributes has been one of the important steps in the prior processing of data. Feature selection is used to create a subset of relevant feature for effective classification of data. In a classification of high-dimensional data, the classifier usually depends on the feature subset that has been used for classification. The Relief algorithm is a popular heuristic approach to select significant feature subsets. The Relief algorithm estimates feature individually and selects top-scored feature for subset generation. Many extensions of the Relief algorithm have been developed. However, an important defect in the Relief-based algorithms has been ignored for years. Because of the uncertainty and noise of the instances used for measuring the feature score in the Relief algorithm, the outcome results will vacillate with the instances, which lead to poor classification accuracy. To fix this problem, a novel feature selection algorithm based on Chebyshev distance-outlier detection model is proposed called noisy feature removal-Relief, NFR-ReliefF in short. To demonstrate the performance of NFR-ReliefF algorithm, an extensive experiment, including classification tests, has been carried out on nine benchmarking high-dimensional datasets by uniting the proposed model with standard classifiers, including the naive Bayes, C4.5 and KNN. The results prove that NFR-ReliefF outperforms the other models on most tested datasets.
机译:数据预处理是普遍存在的,并且选择重要属性是数据之前处理数据的重要步骤之一。功能选择用于创建相关功能的子集,以有效分类数据。在高维数据的分类中,分类器通常取决于已经用于分类的特征子集。浮雕算法是一种流行的启发式方法,可以选择重要的特征子集。释放算法单独估计功能,并为子集生成选择顶级特征。已经开发了许多释放算法的扩展。然而,多年来忽略了基于救济的算法的重要缺陷。由于用于测量浮雕算法中特征分数的情况的不确定度和噪声,结果结果将与该实例摇动,这导致分类差的准确性差。为了解决这个问题,提出了一种基于Chebyshev距离异常检测模型的新颖特征选择算法,称为嘈杂的特征删除 - 简而言之,NFR-Creieff。为了证明NFR-Creieff算法的性能,通过将所提出的模型与标准分类器(包括Naive Bayes,C4.5和Knn)为九个基准测试高维数据集进行了广泛的实验,包括分类测试,包括分类测试。结果证明,NFR-Creieff在大多数测试数据集中表现出其他模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号