首页> 美国卫生研究院文献>Bioinformatics >MIST: Maximum Information Spanning Trees for dimension reduction of biological data sets
【2h】

MIST: Maximum Information Spanning Trees for dimension reduction of biological data sets

机译:MIST:最大信息生成树用于生物数据集的降维

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Motivation: The study of complex biological relationships is aided by large and high-dimensional data sets whose analysis often involves dimension reduction to highlight representative or informative directions of variation. In principle, information theory provides a general framework for quantifying complex statistical relationships for dimension reduction. Unfortunately, direct estimation of high-dimensional information theoretic quantities, such as entropy and mutual information (MI), is often unreliable given the relatively small sample sizes available for biological problems. Here, we develop and evaluate a hierarchy of approximations for high-dimensional information theoretic statistics from associated low-order terms, which can be more reliably estimated from limited samples. Due to a relationship between this metric and the minimum spanning tree over a graph representation of the system, we refer to these approximations as MIST (Maximum Information Spanning Trees).>Results: The MIST approximations are examined in the context of synthetic networks with analytically computable entropies and using experimental gene expression data as a basis for the classification of multiple cancer types. The approximations result in significantly more accurate estimates of entropy and MI, and also correlate better with biological classification error than direct estimation and another low-order approximation, minimum-redundancy–maximum-relevance (mRMR).>Availability: Software to compute the entropy approximations described here is available as .>Contact: >Supplementary information: are available at Bioinformatics online.
机译:>动机:复杂的生物学关系的研究借助于大型和高维度的数据集进行分析,这些数据集通常涉及减少维度以突出代表或信息丰富的变化方向。原则上,信息理论为量化复杂的统计关系以减少维度提供了一个通用框架。不幸的是,鉴于可用于生物学问题的样本量相对较小,直接估计高维信息理论量(例如熵和互信息(MI))通常不可靠。在这里,我们从关联的低阶项开发和评估高维信息理论统计的近似层次,可以从有限的样本中更可靠地对其进行估算。由于此指标与系统的图形表示形式上的最小生成树之间存在关系,因此我们将这些近似值称为MIST(最大信息生成树)。>结果:具有可计算的熵的合成网络的背景,并使用实验性基因表达数据作为多种癌症类型分类的基础。与直接估计和另一个低阶近似,最小冗余-最大相关性(mRMR)相比,这种近似不仅可以使熵和MI的估计更加准确,而且与生物学分类错误的相关性也更好。>可用性:此处描述的用于计算熵近似值的软件可从以下网站获取。>联系方式: >补充信息:可从在线生物信息学获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号