首页> 外文OA文献 >On the Upper Bounds of the Real-Valued Predictions
【2h】

On the Upper Bounds of the Real-Valued Predictions

机译:在真实预测的上限

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Predictions are fundamental in science as they allow to test and falsify theories. Predictions are ubiquitous in bioinformatics and also help when no first principles are available. Predictions can be distinguished between classifications (when we associate a label to a given input) or regression (when a real value is assigned). Different scores are used to assess the performance of regression predictors; the most widely adopted include the mean square error, the Pearson correlation (ρ), and the coefficient of determination (or R 2 ). The common conception related to the last 2 indices is that the theoretical upper bound is 1; however, their upper bounds depend both on the experimental uncertainty and the distribution of target variables. A narrow distribution of the target variable may induce a low upper bound. The knowledge of the theoretical upper bounds also has 2 practical applications: (1) comparing different predictors tested on different data sets may lead to wrong ranking and (2) performances higher than the theoretical upper bounds indicate overtraining and improper usage of the learning data sets. Here, we derive the upper bound for the coefficient of determination showing that it is lower than that of the square of the Pearson correlation. We provide analytical equations for both indices that can be used to evaluate the upper bound of the predictions when the experimental uncertainty and the target distribution are available. Our considerations are general and applicable to all regression predictors.
机译:预测是科学的基础,因为它们允许测试和伪造理论。预测在生物信息学中普遍存在,并且在没有第一次使用第一个原则时也有助于帮助。可以在分类之间区分预测(当我们将标签与给定输入相关联时)或回归(分配真实值时)。不同的分数用于评估回归预测因子的性能;最广泛采用包括均方误差,Pearson相关(ρ)和确定系数(或r 2)。与最后2个指标相关的共同概念是理论上限为1;然而,它们的上限取决于实验性的不确定性和目标变量的分布。目标变量的窄分布可以诱导低上限。理论上的上限的知识也具有2个实际应用:(1)比较不同数据集上测试的不同预测器可能导致错误排名和(2)高于理论上限的性能表明学习数据集的过度训练和不当使用。在这里,我们得出了确定系数的上限,表明它低于Pearson相关的平方。我们为两种指标提供分析方程,这些指数可用于评估试验性不确定性和目标分布时可用于评估预测的上限。我们的考虑是一般的,适用于所有回归预测因子。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号