...
首页> 外文期刊>Open Engineering >Assessing the quality of classification models: Performance measures and evaluation procedures
【24h】

Assessing the quality of classification models: Performance measures and evaluation procedures

机译:评估分类模型的质量:绩效指标和评估程序

获取原文
           

摘要

This article systematically reviews techniques used for the evaluation of classification models and provides guidelines for their proper application. This includes performance measures assessing the model’s performance on a particular dataset and evaluation procedures applying the former to appropriately selected data subsets to produce estimates of their expected values on new data. Their common purpose is to assess model generalization capabilities, which are crucial for judging the applicability and usefulness of both classification and any other data mining models. The review presented in this article is expected to be sufficiently in-depth and complete for most practical needs, while remaining clear and easy to follow with little prior knowledge. Issues that receive special attention include incorporating instance weights to performance measures, combining the same set of evaluation procedures with arbitrary performance measures, and avoiding pitfalls related to separating data subsets used for evaluation from those used for model creation. With the classification task unquestionably being one of the central data mining tasks and the vastly increasing number of data mining applications — not only in business, but also in engineering and research — this is expected to be interesting and useful for a wide audience. All presented techniques are accompanied by simple R language implementations and usage examples, which — whereas created to serve the illustration purpose mostly — can be actually used in practice.
机译:本文系统地回顾了用于评估分类模型的技术,并为其正确应用提供了指导。这包括性能指标上的特定数据集评估模型的性能和评价程序将前者适当选择数据子集对新数据的预期值的估计数字。它们的共同目的是评估模型泛化能力,这对于判断分类和任何其他数据挖掘模型的适用性和实用性至关重要。预期本文中的评论将针对大多数实际需求进行足够的深度和完整,同时保持清晰且易于理解,并且无需任何先验知识。需要特别注意的问题包括将实例权重合并到性能指标中,将同一套评估程序与任意性能指标相结合,并避免与将用于评估的数据子集与用于模型创建的子集分离有关的陷阱。毫无疑问,分类任务是中心数据挖掘任务之一,并且数据挖掘应用程序的数量不断增加(不仅在业务方面,而且在工程学和研究领域),这对于广泛的受众来说将是有趣且有用的。所有提出的技术都附带有简单的R语言实现和用法示例,尽管这些语言创建来主要是用于说明目的,但实际上可以在实践中使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号