Background: The most popular methods for significance analysis on microarray data are wellsuited to find genes differentially expressed across predefined categories. However, identificationof features that correlate with continuous dependent variables is more difficult using thesemethods, and long lists of significant genes returned are not easily probed for co-regulations anddependencies. Dimension reduction methods are much used in the microarray literature forclassification or for obtaining low-dimensional representations of data sets. These methods have anadditional interpretation strength that is often not fully exploited when expression data areanalysed. In addition, significance analysis may be performed directly on the model parameters tofind genes that are important for any number of categorical or continuous responses. Weintroduce a general scheme for analysis of expression data that combines significance testing withthe interpretative advantages of the dimension reduction methods. This approach is applicable bothfor explorative analysis and for classification and regression problems.Results: Three public data sets are analysed. One is used for classification, one contains spiked-intranscripts of known concentrations, and one represents a regression problem with severalmeasured responses. Model-based significance analysis is performed using a modified version ofHotelling's T2-test, and a false discovery rate significance level is estimated by resampling. Ourresults show that underlying biological phenomena and unknown relationships in the data can bedetected by a simple visual interpretation of the model parameters. It is also found that measuredphenotypic responses may model the expression data more accurately than if the designparametersare used as input. For the classification data, our method finds much the same genes asthe standard methods, in addition to some extra which are shown to be biologically relevant. Thelist of spiked-in genes is also reproduced with high accuracy.Conclusion: The dimension reduction methods are versatile tools that may also be used forsignificance testing. Visual inspection of model components is useful for interpretation, and themethodology is the same whether the goal is classification, prediction of responses, featureselection or exploration of a data set. The presented framework is conceptually and algorithmicallysimple, and a Matlab toolbox (Mathworks Inc, USA) is supplemented.
展开▼