首页> 美国卫生研究院文献>Bioinformatics and Biology Insights >On the Upper Bounds of the Real-Valued Predictions
【2h】

On the Upper Bounds of the Real-Valued Predictions

机译:关于实值预测的上限

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Predictions are fundamental in science as they allow to test and falsify theories. Predictions are ubiquitous in bioinformatics and also help when no first principles are available. Predictions can be distinguished between classifications (when we associate a label to a given input) or regression (when a real value is assigned). Different scores are used to assess the performance of regression predictors; the most widely adopted include the mean square error, the Pearson correlation (ρ), and the coefficient of determination (or R2). The common conception related to the last 2 indices is that the theoretical upper bound is 1; however, their upper bounds depend both on the experimental uncertainty and the distribution of target variables. A narrow distribution of the target variable may induce a low upper bound. The knowledge of the theoretical upper bounds also has 2 practical applications: (1) comparing different predictors tested on different data sets may lead to wrong ranking and (2) performances higher than the theoretical upper bounds indicate overtraining and improper usage of the learning data sets. Here, we derive the upper bound for the coefficient of determination showing that it is lower than that of the square of the Pearson correlation. We provide analytical equations for both indices that can be used to evaluate the upper bound of the predictions when the experimental uncertainty and the target distribution are available. Our considerations are general and applicable to all regression predictors.
机译:预测是科学的基础,因为它们可以检验和伪造理论。预测在生物信息学中无处不在,并且在没有基本原理时也有帮助。可以区分分类(当我们将标签关联到给定的输入时)还是回归(当分配了实际值时)之间的预测。使用不同的分数来评估回归预测器的性能。最广泛采用的方法包括均方误差,皮尔逊相关性(ρ)和确定系数(或 R 2 )。与后两个索引相关的共同概念是理论上限为1;但是,它们的上限取决于实验的不确定性和目标变量的分布。目标变量的窄分布可能会导致较低的上限。理论上限的知识也有2个实际应用:(1)比较在不同数据集上测试的不同预测变量可能导致错误的排名;(2)高于理论上限的性能表明学习数据集训练过度和使用不当。在这里,我们得出确定系数的上限,表明该系数低于皮尔逊相关性平方的平方。当实验不确定性和目标分布可用时,我们为两个指标提供了解析方程式,可用于评估预测的上限。我们的考虑是通用的,适用于所有回归预测指标。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号