...
首页> 外文期刊>BMC Bioinformatics >A framework for significance analysis of gene expression data using dimension reduction methods
【24h】

A framework for significance analysis of gene expression data using dimension reduction methods

机译:使用降维方法对基因表达数据进行重要性分析的框架

获取原文

摘要

Background The most popular methods for significance analysis on microarray data are well suited to find genes differentially expressed across predefined categories. However, identification of features that correlate with continuous dependent variables is more difficult using these methods, and long lists of significant genes returned are not easily probed for co-regulations and dependencies. Dimension reduction methods are much used in the microarray literature for classification or for obtaining low-dimensional representations of data sets. These methods have an additional interpretation strength that is often not fully exploited when expression data are analysed. In addition, significance analysis may be performed directly on the model parameters to find genes that are important for any number of categorical or continuous responses. We introduce a general scheme for analysis of expression data that combines significance testing with the interpretative advantages of the dimension reduction methods. This approach is applicable both for explorative analysis and for classification and regression problems. Results Three public data sets are analysed. One is used for classification, one contains spiked-in transcripts of known concentrations, and one represents a regression problem with several measured responses. Model-based significance analysis is performed using a modified version of Hotelling's T 2-test, and a false discovery rate significance level is estimated by resampling. Our results show that underlying biological phenomena and unknown relationships in the data can be detected by a simple visual interpretation of the model parameters. It is also found that measured phenotypic responses may model the expression data more accurately than if the design-parameters are used as input. For the classification data, our method finds much the same genes as the standard methods, in addition to some extra which are shown to be biologically relevant. The list of spiked-in genes is also reproduced with high accuracy. Conclusion The dimension reduction methods are versatile tools that may also be used for significance testing. Visual inspection of model components is useful for interpretation, and the methodology is the same whether the goal is classification, prediction of responses, feature selection or exploration of a data set. The presented framework is conceptually and algorithmically simple, and a Matlab toolbox (Mathworks Inc, USA) is supplemented.
机译:背景技术对微阵列数据进行显着性分析的最流行方法非常适合于查找在预定类别之间差异表达的基因。但是,使用这些方法来识别与连续因变量相关的特征更加困难,并且返回的重要基因的长列表很难通过共调控和依赖性进行探测。降维方法在微阵列文献中大量用于分类或获得数据集的低维表示。这些方法具有额外的解释能力,在分析表达数据时通常无法充分利用。另外,可以直接在模型参数上进行显着性分析,以找到对任何数量的分类或连续响应都重要的基因。我们介绍了一种用于表达数据分析的通用方案,该方案将重要性测试与降维方法的解释优势相结合。此方法适用于探索性分析以及分类和回归问题。结果分析了三个公共数据集。一种用于分类,一种包含已知浓度的加标转录本,另一种表示具有多个测得响应的回归问题。使用Hotelling的T 2 检验的修改版进行基于模型的显着性分析,并通过重采样来估计错误发现率的显着性水平。我们的结果表明,可以通过对模型参数的简单直观解释来检测数据中潜在的生物学现象和未知关系。还发现,与将设计参数用作输入相比,所测量的表型响应可以更准确地对表达数据进行建模。对于分类数据,我们的方法发现了与标准方法几乎相同的基因,此外还发现了一些与生物学相关的基因。掺入基因的列表也可以高精度复制。结论降维方法是通用工具,也可用于重要性测试。目视检查模型组件对于解释很有用,无论目标是分类,响应预测,特征选择还是数据集探索,其方法都是相同的。提出的框架在概念和算法上都很简单,并补充了Matlab工具箱(Mathworks Inc,美国)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号