Computational cluster validation for microarray data analysis: experimental assessment of Clest, Consensus Clustering, Figure of Merit, Gap Statistics and Model Explorer

Raffaele Giancarlo; Davide Scaturro; Filippo Utro

首页> 外文期刊>BMC Bioinformatics >Computational cluster validation for microarray data analysis: experimental assessment of Clest, Consensus Clustering, Figure of Merit, Gap Statistics and Model Explorer

【24h】

Computational cluster validation for microarray data analysis: experimental assessment of Clest, Consensus Clustering, Figure of Merit, Gap Statistics and Model Explorer

机译：微阵列数据分析的计算聚类验证：Clest，共识聚类，优值图，缺口统计和模型浏览器的实验评估

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background Inferring cluster structure in microarray datasets is a fundamental task for the so-called -omic sciences. It is also a fundamental question in Statistics, Data Analysis and Classification, in particular with regard to the prediction of the number of clusters in a dataset, usually established via internal validation measures. Despite the wealth of internal measures available in the literature, new ones have been recently proposed, some of them specifically for microarray data. Results We consider five such measures: Clest, Consensus (Consensus Clustering), FOM (Figure of Merit), Gap (Gap Statistics) and ME (Model Explorer), in addition to the classic WCSS (Within Cluster Sum-of-Squares) and KL (Krzanowski and Lai index). We perform extensive experiments on six benchmark microarray datasets, using both Hierarchical and K-means clustering algorithms, and we provide an analysis assessing both the intrinsic ability of a measure to predict the correct number of clusters in a dataset and its merit relative to the other measures. We pay particular attention both to precision and speed. Moreover, we also provide various fast approximation algorithms for the computation of Gap, FOM and WCSS. The main result is a hierarchy of those measures in terms of precision and speed, highlighting some of their merits and limitations not reported before in the literature. Conclusion Based on our analysis, we draw several conclusions for the use of those internal measures on microarray data. We report the main ones. Consensus is by far the best performer in terms of predictive power and remarkably algorithm-independent. Unfortunately, on large datasets, it may be of no use because of its non-trivial computer time demand (weeks on a state of the art PC). FOM is the second best performer although, quite surprisingly, it may not be competitive in this scenario: it has essentially the same predictive power of WCSS but it is from 6 to 100 times slower in time, depending on the dataset. The approximation algorithms for the computation of FOM, Gap and WCSS perform very well, i.e., they are faster while still granting a very close approximation of FOM and WCSS. The approximation algorithm for the computation of Gap deserves to be singled-out since it has a predictive power far better than Gap, it is competitive with the other measures, but it is at least two order of magnitude faster in time with respect to Gap. Another important novel conclusion that can be drawn from our analysis is that all the measures we have considered show severe limitations on large datasets, either due to computational demand (Consensus, as already mentioned, Clest and Gap) or to lack of precision (all of the other measures, including their approximations). The software and datasets are available under the GNU GPL on the supplementary material web page.

机译：背景技术推断微阵列数据集中的簇结构是所谓的组学科学的一项基本任务。这也是统计，数据分析和分类中的一个基本问题，尤其是在通常通过内部验证措施建立的数据集中聚类预测方面。尽管文献中提供了大量内部措施，但最近已提出了新的措施，其中一些专门用于微阵列数据。结果我们考虑了五种这样的度量：除了经典的WCSS（集群内平方和）之外，Clest，共识（共识聚类），FOM（优点图），Gap（差距统计）和ME（模型资源管理器）和KL（Krzanowski and Lai index）。我们使用层次聚类和K-均值聚类算法对六个基准微阵列数据集进行了广泛的实验，并且我们提供了一种分析，可评估一种方法的固有能力，以预测数据集中聚类的正确数量以及其相对于其他聚类的优点措施。我们特别关注精度和速度。此外，我们还提供各种快速近似算法来计算Gap，FOM和WCSS。主要结果是在精度和速度方面对这些度量进行了等级划分，突出了它们在文献中未曾报道过的一些优缺点。结论基于我们的分析，对于在微阵列数据上使用这些内部测量方法，我们得出了一些结论。我们报告主要的。就预测能力而言，共识是迄今为止表现最好的，并且与算法无关。不幸的是，在大型数据集上，由于计算机时间需求不平凡（在最先进的PC机上为数周），它可能没有用。 FOM是表现第二好的，尽管令人惊讶的是，它在这种情况下可能没有竞争力：它具有与WCSS基本上相同的预测能力，但是根据数据集，它的速度慢了6到100倍。用于计算FOM，Gap和WCSS的近似算法执行得很好，即，它们虽然速度更快，但仍然可以使FOM和WCSS非常接近。计算Gap的近似算法值得一提，因为它的预测能力远胜于Gap，与其他度量相比具有竞争力，但相对于Gap，它的时间至少快两个数量级。可以从我们的分析中得出的另一个重要的新颖结论是，我们考虑的所有度量都对大型数据集显示了严重限制，这可能是由于计算需求（共识，如前所述，Clest和Gap）或缺乏精确度（所有其他措施，包括其近似值）。该软件和数据集可在补充材料网页上的GNU GPL下获得。

著录项

来源
《BMC Bioinformatics》 |2008年第1期|共页
作者
Raffaele Giancarlo; Davide Scaturro; Filippo Utro;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类生物科学;
关键词

相似文献

外文文献
中文文献
专利

1. Consensus Framework for Exploring Microarray Data Using Multiple Clustering Methods [J] . Ted Laderas, Shannon McWeeney OMICS: A journal of integrative biology . 2007,第1期

机译：使用多种聚类方法探索微阵列数据的共识框架
2. Algorithmic paradigms for stability-based cluster validity and model selection statistical methods, with applications to microarray data analysis [J] . R. Giancarlo, F. Utro Theoretical computer science . 2012,第Null期

机译：基于稳定性的聚类有效性和模型选择统计方法的算法范例及其在微阵列数据分析中的应用
3. An evolutionary computational model applied to cluster analysis of DNA microarray data [J] . Jose A. Castellanos-Garzon, Fernando Diaz Expert Systems with Application . 2013,第7期

机译：一种进化计算模型，用于DNA微阵列数据的聚类分析
4. A Novel Approach for Automatic Number of Clusters Detection in Microarray Data Based on Consensus Clustering [C] . IEEE International Conference on BioInformatics and BioEngineering . 2009

机译：基于共识聚类的微阵列数据中群集簇检测的一种新方法
5. Methods for cluster analysis and validation in microarray gene expression data. [D] . Kosorukoff, Alexander Lvovich. 2006

机译：在微阵列基因表达数据中进行聚类分析和验证的方法。
6. Computational cluster validation for microarray data analysis: experimental assessment of Clest Consensus Clustering Figure of Merit Gap Statistics and Model Explorer [O] . Raffaele Giancarlo, Davide Scaturro, Filippo Utro 2008

机译：用于微阵列数据分析的计算聚类验证：Clest共识聚类优值图缺口统计和模型浏览器的实验评估
7. Computational cluster validation for microarray data analysis: experimental assessment of Clest, Consensus Clustering, Figure of Merit, Gap Statistics and Model Explorer [O] . Giancarlo, R, Scaturro, D, Utro F 2008

机译：用于微阵列数据分析的计算聚类验证：Clest，共识聚类，优值图，缺口统计和模型浏览器的实验评估
8. Application of Cluster Analysis to Aerometric Data. Volume I. Part 1: Clustering, Validation, and Classification of Data. Part 2: Investigation and Report of Cluster Analysis [R] . Crutcher, H. L. , Nelson, C. , Fairbairn, B. , 1980

机译：聚类分析在航空数据中的应用。第一部分：数据的聚类，验证和分类。第2部分：聚类分析的调查和报告

Computational cluster validation for microarray data analysis: experimental assessment of Clest, Consensus Clustering, Figure of Merit, Gap Statistics and Model Explorer

摘要

著录项

相似文献

相关主题

期刊订阅