首页> 外文会议>International conference on discovery science >The Highs and Lows of Performance Evaluation: Towards a Measurement Theory for Machine Learning
【24h】

The Highs and Lows of Performance Evaluation: Towards a Measurement Theory for Machine Learning

机译:绩效评估的高度和低点:朝着机器学习的测量理论

获取原文
获取外文期刊封面目录资料

摘要

Our understanding of performance evaluation measures for machine-learned classifiers has improved considerably over the last decades. However, there is a range of areas where this understanding is still lacking, leading to ill-advised practices in classifier evaluation. This is clearly problematic, since if machine learning researchers are unclear about what exactly their experiments are telling them about their machine learning algorithms, then how can end-users trust systems deploying those algorithms? 1 suggest that in order to make further progress we need to develop a proper measurement theory of machine learning. Measurement theory studies the concepts of measurement and scale. If you have a way to measure, say, the length of individual rods or planks, this should also allow you to then calculate the combined length of concatenated rods or planks. What relevant concatenation operations are there in data science and AI, and what does that mean for the underlying measurement scale? I discuss by example what such a measurement theory might look like and what kinds of new results it would entail. I furthermore argue that key properties such as classification ability and data set difficulty are unlikely to be directly observable, suggesting the need for latent-variable models. Ultimately, machine learning experiments need to go beyond simple correlations and aim to make causal inferences of the form 'Algorithm A outperformed algorithm B because two classes were highly imbalanced,' or counterfactually, 'if the classes were rebalanced, the observed performance difference between A and B would disappear.
机译:我们对机器学习分类机的绩效评估措施的理解在过去几十年中已经提高了大大提高。但是,存在一系列区域仍然缺乏这种理解,导致分类器评估中的不明意识。这显然是有问题的,因为当机器学习研究人员尚不清楚他们的实验何时才能告诉他们他们的机器学习算法,那么如何最终用户的信任系统部署这些算法? 1表明,为了进一步进步,我们需要开发机器学习的适当测量理论。测量理论研究测量和规模的概念。如果您有一种衡量的方法,例如单个杆或木板的长度,这也应该允许您计算连接杆或板条的组合长度。数据科学和AI中有哪些相关的级联操作,以及潜在的测量规模这意味着什么?我通过示例讨论这样的测量理论可能看起来像什么样的新结果它将需要。我此外,旨在认为诸如分类能力和数据集的关键属性不太可能是直接可观察到的,这表明需要潜在变量模型。最终,机器学习实验需要超出简单的相关性,并旨在使表单算法的因果推论成为优势算法B,因为两个类是高度不平衡的,'或反转'如果类别重新平衡,则观察到的性能差异B会消失。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号