首页> 外文期刊>Knowledge-Based Systems >Multi-granularity feature selection on cost-sensitive data with measurement errors and variable costs
【24h】

Multi-granularity feature selection on cost-sensitive data with measurement errors and variable costs

机译:具有成本误差和可变成本的对成本敏感的数据的多粒度特征选择

获取原文
获取原文并翻译 | 示例

摘要

In real applications of data mining, machine learning and granular computing, measurement errors, test costs and misclassification costs often occur. Furthermore, the test cost of a feature is usually variable with the error range, and the variability of the misclassification cost is related to the object considered. Recently, some approaches based on rough sets have been introduced to study the error-based cost-sensitive feature selection problem. However, most of them consider only single-granularity cases, thus are not feasible for the case where the granularity diversity between different features should be taken into account. Motivated by this problem, we propose a multi-granularity feature selection approach which considers measurement errors and variable costs in terms of feature-value granularities. For a given feature, the feature-value granularity is evaluated by the error confidence level of the feature values. In this way, we build a theoretic framework called confidence-level-vector-based neighborhood rough set, and present a so-called heuristic feature-granularity selection algorithm, and a relevant competition strategy which can select both features and their respective feature-value granularities effectively and efficiently. Experiment results show that a satisfactory trade-off among feature dimension reduction, feature-value granularity selection and total cost minimization can be achieved by the proposed approach. This work would provide a new insight into the cost-sensitive feature selection problem from the multi-granularity perspective.
机译:在数据挖掘,机器学习和粒度计算的实际应用中,经常会发生测量错误,测试成本和分类错误。此外,特征的测试成本通常随误差范围而变化,并且误分类成本的可变性与所考虑的对象有关。最近,已经引入了一些基于粗糙集的方法来研究基于错误的成本敏感特征选择问题。然而,它们中的大多数仅考虑单粒度情况,因此对于应考虑不同特征之间的粒度多样性的情况是不可行的。受此问题的影响,我们提出了一种多粒度特征选择方法,该方法根据特征值粒度考虑了测量误差和可变成本。对于给定的特征,特征值粒度由特征值的错误置信度来评估。这样,我们建立了一个基于置信度-水平向量的邻域粗糙集的理论框架,并提出了一种所谓的启发式特征粒度选择算法,以及可以同时选择特征和各自特征值的相关竞争策略。有效地提高粒度。实验结果表明,该方法可以在特征量减少,特征值粒度选择和总成本最小化之间取得令人满意的折衷。这项工作将从多粒度角度为对成本敏感的特征选择问题提供新的见解。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号