...
首页> 外文期刊>Science Advances >Neyman-Pearson classification algorithms and NP receiver operating characteristics
【24h】

Neyman-Pearson classification algorithms and NP receiver operating characteristics

机译:Neyman-Pearson分类算法和NP接收器工作特性

获取原文
   

获取外文期刊封面封底 >>

       

摘要

In many binary classification applications, such as disease diagnosis and spam detection, practitioners commonly face the need to limit type I error (that is, the conditional probability of misclassifying a class 0 observation as class 1) so that it remains below a desired threshold. To address this need, the Neyman-Pearson (NP) classification paradigm is a natural choice; it minimizes type II error (that is, the conditional probability of misclassifying a class 1 observation as class 0) while enforcing an upper bound, α, on the type I error. Despite its century-long history in hypothesis testing, the NP paradigm has not been well recognized and implemented in classification schemes. Common practices that directly limit the empirical type I error to no more than α do not satisfy the type I error control objective because the resulting classifiers are likely to have type I errors much larger than α, and the NP paradigm has not been properly implemented in practice. We develop the first umbrella algorithm that implements the NP paradigm for all scoring-type classification methods, such as logistic regression, support vector machines, and random forests. Powered by this algorithm, we propose a novel graphical tool for NP classification methods: NP receiver operating characteristic (NP-ROC) bands motivated by the popular ROC curves. NP-ROC bands will help choose α in a data-adaptive way and compare different NP classifiers. We demonstrate the use and properties of the NP umbrella algorithm and NP-ROC bands, available in the R package nproc, through simulation and real data studies.
机译:在许多二进制分类应用中,例如疾病诊断和垃圾邮件检测,从业人员通常面临限制I型错误(即,将0级观察错误分类为1级的条件概率)的需求,以使其保持在所需阈值以下。为了满足这种需求,Neyman-Pearson(NP)分类范式是很自然的选择。它将II类错误(即,将1类观测值错误分类为0类的条件概率)最小化,同时对I类错误实施了上限α。尽管在假设检验中已有一百多年的历史,但是NP范例尚未在分类方案中得到很好的认可和实施。直接将经验性I类错误限制为不超过α的常见做法不能满足I类错误控制的目标,因为所得分类器的I类错误可能比α大得多,并且NP范式尚未在I类中正确实现实践。我们开发了第一个伞形算法,该算法为所有评分类型分类方法(例如逻辑回归,支持向量机和随机森林)实施NP范式。在此算法的支持下,我们为NP分类方法提出了一种新颖的图形工具:由流行的ROC曲线驱动的NP接收器工作特性(NP-ROC)波段。 NP-ROC频段将以数据自适应方式帮助选择α并比较不同的NP分类器。我们通过模拟和实际数据研究,证明了R包nproc中的NP伞形算法和NP-ROC频段的用法和特性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号