首页> 外文期刊>Bioinformatics >Mutual information is critically dependent on prior assumptions: would the correct estimate of mutual information please identify itself?
【24h】

Mutual information is critically dependent on prior assumptions: would the correct estimate of mutual information please identify itself?

机译:相互信息在很大程度上取决于先前的假设:对相互信息的正确估计是否可以标识自己?

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: Mutual information (MI) is a quantity that measures the dependence between two arbitrary random variables and has been repeatedly used to solve a wide variety of bioinformatic problems. Recently, when attempting to quantify the effects of sampling variance on computed values of MI in proteins, we encountered striking differences among various novel estimates of MI. These differences revealed that estimating the 'true' value of MI is not a straightforward procedure, and minor variations of assumptions yielded remarkably different estimates.Results: We describe four formally equivalent estimates of MI, three of which explicitly account for sampling variance, that yield non-equal values of MI given exact frequencies. These MI estimates are essentially non-predictive of each other, converging only in the limit of implausibly large datasets. Lastly, we show that all four estimates are biologically reasonable estimates of MI, despite their disparity, since each is actually the Kullback-Leibler divergence between random variables conditioned on equally plausible hypotheses.Conclusions: For sparse contingency tables of the type universally observed in protein coevolution studies, our results show that estimates of MI, and hence inferences about physical phenomena such as coevolution, are critically dependent on at least three prior assumptions. These assumptions are: (i) how observation counts relate to expected frequencies; (ii) the relationship between joint and marginal frequencies; and (iii) how non-observed categories are interpreted. In any biologically relevant data, these assumptions will affect the MI estimate as much or more-so than observed data, and are independent of uncertainty in frequency parameters.
机译:动机:互信息(MI)是衡量两个任意随机变量之间的依存关系的量,并已被反复用于解决各种各样的生物信息学问题。最近,当试图量化采样方差对蛋白质中MI计算值的影响时,我们遇到了各种新颖的MI估计之间的显着差异。这些差异表明,估计MI的``真实''价值不是一个简单的过程,假设的微小变化会产生明显不同的估计结果。结果:我们描述了四个形式上等效的MI估计,其中三个明确考虑了抽样方差,即给定精确频率的MI的不相等值。这些MI估计基本上是相互不可预测的,仅在难以置信的大型数据集的范围内收敛。最后,我们证明所有这四个估计值都是MI的生物学上合理的估计值,尽管它们之间存在差异,因为每个估计值实际上都是以同样合理的假设为条件的随机变量之间的Kullback-Leibler差异。进行协同进化研究,我们的结果表明,对MI的估计以及对诸如协同进化之类的物理现象的推论,在很大程度上取决于至少三个先前的假设。这些假设是:(i)观测计数如何与预期频率相关; (ii)联合频率和边际频率之间的关系; (iii)如何解释未观察到的类别。在任何与生物学相关的数据中,这些假设将对MI估计值的影响与所观察到的数据相同或更多,并且与频率参数的不确定性无关。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号