首页> 外文期刊>Journal of Data Analysis and Information Processing >A Feature Subset Selection Technique for High Dimensional Data Using Symmetric Uncertainty
【24h】

A Feature Subset Selection Technique for High Dimensional Data Using Symmetric Uncertainty

机译:利用对称不确定性的高维数据特征子集选择技术

获取原文
       

摘要

With the abundance of exceptionally High Dimensional data, feature selection has become an essential element in the Data Mining process. In this paper, we investigate the problem of efficient feature selection for classification on High Dimensional datasets. We present a novel filter based approach for feature selection that sorts out the features based on a score and then we measure the performance of four different Data Mining classification algorithms on the resulting data. In the proposed approach, we partition the sorted feature and search the important feature in forward manner as well as in reversed manner, while starting from first and last feature simultaneously in the sorted list. The proposed approach is highly scalable and effective as it parallelizes over both attribute and tuples simultaneously allowing us to evaluate many of potential features for High Dimensional datasets. The newly proposed framework for feature selection is experimentally shown to be very valuable with real and synthetic High Dimensional datasets which improve the precision of selected features. We have also tested it to measure classification accuracy against various feature selection process.
机译:凭借大量的高维数据,特征选择已成为数据挖掘过程中的基本要素。在本文中,我们研究了在高维数据集上进行分类的有效特征选择问题。我们提出了一种基于过滤器的新颖特征选择方法,该方法根据得分对特征进行分类,然后在所得数据上测量四种不同数据挖掘分类算法的性能。在提出的方法中,我们对已排序的特征进行分区,并以正向和反向方式搜索重要特征,同时从已排序列表中的第一个和最后一个特征同时开始。所提出的方法具有高度的可扩展性和有效性,因为它同时对属性和元组进行并行化处理,从而使我们能够评估高维数据集的许多潜在特征。实验证明,新提出的特征选择框架对于提高选择特征精度的真实和合成高维数据集非常有价值。我们还对其进行了测试,以衡量针对各种特征选择过程的分类准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号