首页> 外文会议>Pacific-Asia conference on knowledge discovery and data mining >Optimizing Classifiers for Hypothetical Scenarios
【24h】

Optimizing Classifiers for Hypothetical Scenarios

机译:针对假设场景优化分类器

获取原文

摘要

The deployment of classification models is an integral component of many modern data mining and machine learning applications. A typical classification model is built with the tacit assumption that the deployment scenario by which it is evaluated is fixed and fully characterized. Yet, in the practical deployment of classification methods, important aspects of the application environment, such as the misclassification costs, may be uncertain during model building. Moreover, a single classification model may be applied in several different deployment scenarios. In this work, we propose a method to optimize a model for uncertain deployment scenarios. We begin by deriving a relationship between two evaluation measures, H measure and cost curves, that may be used to address uncertainty in classifier performance. We show that when uncertainty in classifier performance is modeled as a probabilistic belief that is a function of this underlying relationship, a natural definition of risk emerges for both classifiers and instances. We then leverage this notion of risk to develop a boosting-based algorithm-which we call RiskBoost-that directly mitigates classifier risk, and we demonstrate that it outperforms AdaBoost on a diverse selection of datasets.
机译:分类模型的部署是许多现代数据挖掘和机器学习应用程序不可或缺的组成部分。一个典型的分类模型是在默认假设的基础上构建的,评估该部署模型是固定的,并且具有充分的特征。但是,在分类方法的实际部署中,应用环境的重要方面(例如错误分类成本)在模型构建过程中可能是不确定的。此外,单个分类模型可以应用于几种不同的部署方案。在这项工作中,我们提出了一种针对不确定的部署方案优化模型的方法。我们首先得出H度量和成本曲线这两个评估度量之间的关系,这些关系可用于解决分类器性能的不确定性。我们表明,当分类器性能的不确定性被建模为基于此基础关系的概率性信念时,分类器和实例的风险自然定义就会出现。然后,我们利用这种风险概念开发了一种基于boosting的算法(我们称为RiskBoost),该算法可直接减轻分类器风险,并且在多种数据集上证明它优于AdaBoost。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号