Comparisons and validation of statistical clustering techniques for microarray gene expression data

Susmita Datta; Somnath Datta

首页> 外文期刊>Bioinformatics >Comparisons and validation of statistical clustering techniques for microarray gene expression data

【24h】

Comparisons and validation of statistical clustering techniques for microarray gene expression data

机译：微阵列基因表达数据的统计聚类技术的比较和验证

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Motivation: With the Advent of microarray chip technology, large data sets are emerging containing the simultaneous expression levels of thousands of genes at various time points during a biological process. Biologists are attempting to group genes based on the temporal pattern of their expression levels. While the use of hierarchical clustering (UPGMA) with correlation 'distance' has been the most common in the microarray studies, there are many more choices of clustering algorithms in pattern recognition and statistics literature. At the moment there do not seem to be any clear-cut guidelines regarding the choice of a clustering algorithm to be used for grouping genes based on their expression profiles. Results: In this paper, we consider six clustering algorithms (of various flavors!) and evaluate their performances on a well-known publicly available microarray data set on sporulation of budding yeast and on two simulated data sets. Among other things, we formulate three reasonable validation strategies that can be used with any clustering algorithm when temporal observations or replications are present. We evaluate each of these six clustering methods with these validation measures. While the 'best' method is dependent on the exact validation strategy and the number of clusters to be used, overall Diana appears to be a solid performer. Interestingly, the performance of correlation-based hierarchical clustering and model-based clustering (another method that has been advocated by a number of researchers) appear to be on opposite extremes, depending on what validation measure one employs. Next it is shown that the group means produced by Diana are the closest and those produced by UPGMA are the farthest from a model profile based on a set of hand-picked genes. Availability: S+ codes for the partial least squares based clustering are available from the authors upon request. All other clustering methods considered have S+ implementation in the library MASS. S+ codes for calculating the validation measures are available from the authors upon request. The sporulation data set is publicly available at http://cmgm.stanford.edu/pbrown/sporulation.

机译：动机：随着微阵列芯片技术的出现，正在出现大数据集，其中包含在生物过程中的各个时间点同时表达数千种基因的水平。生物学家正在尝试根据其表达水平的时间模式对基因进行分组。虽然在微阵列研究中最常使用具有相关性“距离”的分层聚类（UPGMA），但在模式识别和统计文献中还有更多的聚类算法选择。目前，关于用于基于基因表达谱对基因进行分组的聚类算法的选择，似乎还没有明确的指导方针。结果：在本文中，我们考虑了六种（各种口味！）聚类算法，并在关于芽芽酵母形成孢子的众所周知的公开可用微阵列数据集和两个模拟数据集上评估了它们的性能。除其他外，我们制定了三种合理的验证策略，当存在临时观察或复制时，这些策略可以与任何聚类算法一起使用。我们使用这些验证措施来评估这六种聚类方法中的每一种。尽管“最佳”方法取决于确切的验证策略和要使用的簇数，但总体Diana似乎表现良好。有趣的是，基于相关性的层次聚类和基于模型的聚类（许多研究人员提倡的另一种方法）的性能似乎处于相反的极端，具体取决于采用的验证措施。接下来表明，基于一组精选基因，戴安娜产生的分组均值是最接近的，而UPPGA产生的均值是最远的。可用性：作者可应要求提供基于偏最小二乘聚类的S +代码。所考虑的所有其他集群方法在MASS库中都具有S +实现。作者可应要求提供用于计算验证措施的S +代码。孢子形成数据集可从http://cmgm.stanford.edu/pbrown/sporulation上公开获得。

著录项

来源
《Bioinformatics》 |2003年第4期|共8页
作者
Susmita Datta; Somnath Datta;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类生物科学;生物工程学（生物技术）;
关键词

相似文献

外文文献
中文文献
专利

1. Comparisons and validation of statistical clustering techniques for microarray gene expression data [J] . Susmita Datta, Somnath Datta Bioinformatics . 2003,第4期

机译：微阵列基因表达数据的统计聚类技术的比较和验证
2. Computational cluster validation for microarray data analysis: experimental assessment of Clest, Consensus Clustering, Figure of Merit, Gap Statistics and Model Explorer [J] . Raffaele Giancarlo, Davide Scaturro, Filippo Utro BMC Bioinformatics . 2008,第1期

机译：微阵列数据分析的计算聚类验证：Clest，共识聚类，优值图，缺口统计和模型浏览器的实验评估
3. Proximity Measures for Clustering Gene Expression Microarray Data: A Validation Methodology and a Comparative Analysis [J] . Jaskowiak Pablo A., Campello Ricardo J.G.B., Costa Ivan G. IEEE/ACM transactions on computational biology and bioinformatics . 2013,第4期

机译：基因表达微阵列数据聚类的接近度测量：一种验证方法和比较分析
4. Validating HODA with Classical Clustering Techniques using Microarray Gene Expression Data [C] . Mohammed Essam Khalifa, Taysir Hassan A. Soliman, Huda Amin Maghawry International Conference on Data Mining . 2007

机译：使用微阵列基因表达数据验证具有经典聚类技术的HODA
5. Methods for cluster analysis and validation in microarray gene expression data. [D] . Kosorukoff, Alexander Lvovich. 2006

机译：在微阵列基因表达数据中进行聚类分析和验证的方法。
6. Computational cluster validation for microarray data analysis: experimental assessment of Clest Consensus Clustering Figure of Merit Gap Statistics and Model Explorer [O] . Raffaele Giancarlo, Davide Scaturro, Filippo Utro 2008

机译：用于微阵列数据分析的计算聚类验证：Clest共识聚类优值图缺口统计和模型浏览器的实验评估
7. Comparisons and validation of statistical clustering techniques for microarray gene expression data [O] . S. Datta 2003

机译：微阵列基因表达数据统计聚类技术的比较与验证

Comparisons and validation of statistical clustering techniques for microarray gene expression data

摘要

著录项

相似文献

相关主题

期刊订阅