首页> 外文期刊>Bioinformatics >Analysis of a Gibbs sampler method for model-based clustering of gene expression data
【24h】

Analysis of a Gibbs sampler method for model-based clustering of gene expression data

机译:基于Gibbs采样器方法的基于模型的基因表达数据聚类分析

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: Over the last decade, a large variety of clustering algorithms have been developed to detect coregulatory relationships among genes from microarray gene expression data. Model-based clustering approaches have emerged as statistically well-grounded methods, but the properties of these algorithms when applied to large-scale data sets are not always well understood. An in-depth analysis can reveal important insights about the performance of the algorithm, the expected quality of the output clusters, and the possibilities for extracting more relevant information out of a particular data set. Results: We have extended an existing algorithm for model-based clustering of genes to simultaneously cluster genes and conditions, and used three large compendia of gene expression data for Saccharomyces cerevisiae to analyze its properties. The algorithm uses a Bayesian approach and a Gibbs sampling procedure to iteratively update the cluster assignment of each gene and condition. For large-scale data sets, the posterior distribution is strongly peaked on a limited number of equiprobable clusterings. A GO annotation analysis shows that these local maxima are all biologically equally significant, and that simultaneously clustering genes and conditions performs better than only clustering genes and assuming independent conditions. A collection of distinct equivalent clusterings can be summarized as a weighted graph on the set of genes, from which we extract fuzzy, overlapping clusters using a graph spectral method. The cores of these fuzzy clusters contain tight sets of strongly coexpressed genes, while the overlaps exhibit relations between genes showing only partial coexpression.
机译:动机:在过去的十年中,已经开发了各种各样的聚类算法来从微阵列基因表达数据中检测基因之间的调控关系。基于模型的聚类方法已经成为统计上有充分基础的方法,但是当将这些算法应用于大规模数据集时,其属性并不总是被很好地理解。深入的分析可以揭示有关算法性能,输出集群的预期质量以及从特定数据集中提取更多相关信息的可能性的重要见解。结果:我们扩展了现有的基于模型的基因聚类算法,以同时对基因和条件进行聚类,并使用了酿酒酵母基因表达数据的三个大型索引来分析其特性。该算法使用贝叶斯方法和吉布斯采样程序来迭代更新每个基因和条件的聚类分配。对于大规模数据集,后验分布在有限数量的等概率聚类上达到峰值。 GO注释分析表明,这些局部最大值在生物学上均具有同等重要的意义,并且同时聚类基因和条件比仅聚类基因并假设独立条件的性能要好。一组不同的等效聚类可以概括为一组基因上的加权图,然后使用图谱方法从中提取模糊的重叠聚类。这些模糊簇的核心包含紧密的一组强烈共表达的基因,而重叠部分则显示了仅显示部分共表达的基因之间的关系。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号