Probabilistic Extension of Precision, Recall, and F1 Score for More Thorough Evaluation of Classification Models

机译：精密，召回和F1分数的概率扩展，以便更全面地评估分类模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In pursuit of the perfect supervised NLP classifier, razor thin margins and low-resource test-sets can make modeling decisions difficult. Popular metrics such as Accuracy, Precision, and Recall are often insufficient as they fail to give a complete picture of the model's behavior. We present a probabilistic extension of Precision, Recall, and Fl score, which we refer to as confidence-Precision (cPrecision), confidence-Recall (cRecall), and confidence-F1 (cF1) respectively. The proposed metrics address some of the challenges faced when evaluating large-scale NLP systems, specifically when the model's confidence score assignments have an impact on the system's behavior. We describe four key benefits of our proposed metrics as compared to their threshold-based counterparts. Two of these benefits, which we refer to as robustness to missing values and sensitivity to model confidence score assignments are self-evident from the metrics' definitions; the remaining benefits, generalization, and functional consistency are demonstrated empirically.

机译：追求完美的监督NLP分类器，剃刀薄边距和低资源测试集可以使建模决策困难。流行的指标如准确性，精度和召回通常不足，因为它们未能提供模型行为的完整图像。我们介绍了精度，召回和流动得分的概率扩展，我们将其称为自信精确（缩放），置信度召回（Crecall）和置信-F1（CF1）。拟议的指标解决了评估大规模NLP系统时面临的一些挑战，特别是当模型的置信度分配对系统行为产生影响时。与基于阈值的对应物相比，我们描述了我们拟议的指标的四个重点效益。这些福利中的两个，我们将其称为缺少价值观和对模型信心分配的敏感性的鲁棒性来自度量标准的定义是不言而喻的;剩余的益处，泛化和功能一致性是经验证明的。

著录项

来源
《Workshop on Evaluation and Comparison of NLP Systems》|2020年|79-91|共13页
会议地点
作者
Reda Yacouby; Dustin Axman;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Performancemasse von Business Analytics Methoden: Accuracy, Precision, Recall und F1-Score [J] . Benjamin Matthies Controlling . 2020,第4期

机译：业务分析的性能硕士：准确性，精度，召回和F1分数
2. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation [J] . Davide Chicco, Giuseppe Jurman BMC Genomics . 2020,第1期

机译：Matthews相关系数（MCC）在二进制分类评估中对F1分数和准确性的优点
3. Dynamic All-Red Extension at a Signalized Intersection: A Framework of Probabilistic Modeling and Performance Evaluation [J] . Zhang L., Wang L., Zhou K., Intelligent Transportation Systems, IEEE Transactions on . 2012,第1期

机译：信号交叉口的动态全红扩展：概率建模和性能评估的框架
4. A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation [C] . Cyril Goutte, Eric Gaussier European Conference on IR Research(ECIR 2005); 20050321-23; Santiago de Compostela(ES) . 2005

机译：精度，召回率和F分数的概率解释，对评估有影响
5. Probabilistic Topic Modeling and Classification Probabilistic PCA for Text Corpora. [D] . Cheng, Chi Wa. 2011

机译：文本主题的概率主题建模和分类概率PCA。
6. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation [O] . Davide Chicco, Giuseppe Jurman 2020

机译：马修斯相关系数（MCC）优于F1分数的优势和二元分类评估的准确性
7. Table 4: Comparison results of the different variants of our model in terms of recall, precision and F1 score () performance. [O] . -1

机译：表4：在召回，精度和F1得分（％）性能方面，我们模型不同变体的比较结果。
8. Effect of Examinee Certainty on Probabilistic Test Scores and a Comparison of Scoring Methods for Probabilistic Responses [R] . Suhadolnik, D., Weiss, D. J. 1983

机译：考生确定性对概率考试成绩的影响及概率反应评分方法的比较

Probabilistic Extension of Precision, Recall, and F1 Score for More Thorough Evaluation of Classification Models

摘要

著录项

相似文献

相关主题

期刊订阅