首页> 美国卫生研究院文献>Bioinformatics >MIST: Maximum Information Spanning Trees for dimension reduction of biological data sets

【2h】

MIST: Maximum Information Spanning Trees for dimension reduction of biological data sets

机译：MIST：最大信息生成树用于生物数据集的降维

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

>Motivation: The study of complex biological relationships is aided by large and high-dimensional data sets whose analysis often involves dimension reduction to highlight representative or informative directions of variation. In principle, information theory provides a general framework for quantifying complex statistical relationships for dimension reduction. Unfortunately, direct estimation of high-dimensional information theoretic quantities, such as entropy and mutual information (MI), is often unreliable given the relatively small sample sizes available for biological problems. Here, we develop and evaluate a hierarchy of approximations for high-dimensional information theoretic statistics from associated low-order terms, which can be more reliably estimated from limited samples. Due to a relationship between this metric and the minimum spanning tree over a graph representation of the system, we refer to these approximations as MIST (Maximum Information Spanning Trees).>Results: The MIST approximations are examined in the context of synthetic networks with analytically computable entropies and using experimental gene expression data as a basis for the classification of multiple cancer types. The approximations result in significantly more accurate estimates of entropy and MI, and also correlate better with biological classification error than direct estimation and another low-order approximation, minimum-redundancy–maximum-relevance (mRMR).>Availability: Software to compute the entropy approximations described here is available as .>Contact: >Supplementary information: are available at Bioinformatics online.

机译：>动机：复杂的生物学关系的研究借助于大型和高维度的数据集进行分析，这些数据集通常涉及减少维度以突出代表或信息丰富的变化方向。原则上，信息理论为量化复杂的统计关系以减少维度提供了一个通用框架。不幸的是，鉴于可用于生物学问题的样本量相对较小，直接估计高维信息理论量（例如熵和互信息（MI））通常不可靠。在这里，我们从关联的低阶项开发和评估高维信息理论统计的近似层次，可以从有限的样本中更可靠地对其进行估算。由于此指标与系统的图形表示形式上的最小生成树之间存在关系，因此我们将这些近似值称为MIST（最大信息生成树）。>结果：具有可计算的熵的合成网络的背景，并使用实验性基因表达数据作为多种癌症类型分类的基础。与直接估计和另一个低阶近似，最小冗余-最大相关性（mRMR）相比，这种近似不仅可以使熵和MI的估计更加准确，而且与生物学分类错误的相关性也更好。>可用性：此处描述的用于计算熵近似值的软件可从以下网站获取。>联系方式： >补充信息：可从在线生物信息学获得。

著录项

期刊名称 Bioinformatics
作者
Bracken M. King; Bruce Tidor;
展开▼
作者单位

展开▼
年(卷),期 -1(25),9
年度 -1
页码 1165–1172
总页数 8
原文格式 PDF
正文语种
中图分类应用微生物学;生化遗传学;生化药理学;
关键词

相似文献

外文文献
中文文献
专利

1. MIST: Maximum Information Spanning Trees for dimension reduction of biological data sets [J] . King BM, Tidor B Bioinformatics . 2009,第9期

机译：MIST：最大信息生成树，用于生物数据集的降维
2. MIST: Maximum Information Spanning Trees for dimension reduction of biological data sets [J] . Bracken M. King12 and Bruce Tidor123* Bioinformatics . 2009,第9期

机译：MIST：最大信息生成树，用于生物数据集的降维
3. Maximum Spanning Tree Based Redundancy Elimination for Feature Selection of High Dimensional Data [J] . Singh Bharat, Vyas Om Prakash The international arab journal of information technology . 2018,第5期

机译：高维数据特征选择的基于最大生成树的冗余消除
4. Feature selection using Markov clustering and maximum spanning tree in high dimensional data [C] . Neha Bisht, Annappa Basava International Conference on Contemporary Computing . 2016

机译：高维数据中使用马尔可夫聚类和最大生成树的特征选择
5. A minimum spanning tree based clustering algorithm for high throughput biological data. [D] . Pirim, Harun. 2011

机译：用于高通量生物数据的基于最小生成树的聚类算法。
6. Visualization of very large high-dimensional data sets as minimum spanning trees [O] . Daniel Probst, Jean-Louis Reymond 2020

机译：将非常大的高维数据集可视化为最小生成树
7. MIST: Maximum Information Spanning Trees for dimension reduction of biological data sets [O] . King, Bracken M., Tidor, Bruce 2009

机译：MIST：最大信息生成树，用于生物数据集的降维
8. Maximum-Path Leaves Relative to Vertices and the Vertex One Center of a Spanning211 Tree: An Enumeration and Analysis of Configurations [R] . Dowell, L. J. 1998

机译：最大路径离开相对于顶点和spanning211树的顶点一个中心：配置的枚举和分析

MIST: Maximum Information Spanning Trees for dimension reduction of biological data sets

摘要

著录项

相似文献

相关主题

期刊订阅