...
首页> 外文期刊>Data mining and knowledge discovery >Setting decision thresholds when operating conditions are uncertain
【24h】

Setting decision thresholds when operating conditions are uncertain

机译:在操作条件不确定时设定决策阈值

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

The quality of the decisions made by a machine learning model depends on the data and the operating conditions during deployment. Often, operating conditions such as class distribution and misclassification costs have changed during the time since the model was trained and evaluated. When deploying a binary classifier that outputs scores, once we know the new class distribution and the new cost ratio between false positives and false negatives, there are several methods in the literature to help us choose an appropriate threshold for the classifier's scores. However, on many occasions, the information that we have about this operating condition is uncertain. Previous work has considered ranges or distributions of operating conditions during deployment, with expected costs being calculated for ranges or intervals, but still the decision for each point is made as if the operating condition were certain. The implications of this assumption have received limited attention: a threshold choice that is best suited without uncertainty may be suboptimal under uncertainty. In this paper we analyse the effect of operating condition uncertainty on the expected loss for different threshold choice methods, both theoretically and experimentally. We model uncertainty as a second conditional distribution over the actual operation condition and study it theoretically in such a way that minimum and maximum uncertainty are both seen as special cases of this general formulation. This is complemented by a thorough experimental analysis investigating how different learning algorithms behave for a range of datasets according to the threshold choice method and the uncertainty level.
机译:机器学习模型所做的决策的质量取决于部署期间的数据和操作条件。通常,在培训和评估模型的时间内,诸如类分布和错误分类成本的操作条件发生了变化。部署输出分数的二进制分类器时,一旦我们知道误报和假底片之间的新类分布和新的成本比,文献中有几种方法可以帮助我们为分类器的分数选择适当的阈值。但是,在很多场合,我们对此操作条件的信息是不确定的。以前的工作已经考虑了部署过程中运行条件的范围或分布,预期成本用于范围或间隔,但仍然是每个点的决定,好像操作条件确定。这种假设的含义已经受到有限的注意:最适合没有不确定性的阈值选择可能是在不确定性下的次优。在本文中,我们在理论上和实验上分析了对不同阈值选择方法的预期损失的操作条件不确定性的影响。我们在实际操作条件下模拟不确定性作为第二个条件分布,从理论上研究其最小和最大不确定性都被视为该一般配方的特殊情况。这是通过彻底的实验分析来补充,研究根据阈值选择方法和不确定性等级的不同学习算法如何为一系列数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号