首页> 美国卫生研究院文献>other >Bayesian Hierarchical Clustering for Studying Cancer Gene Expression Data with Unknown Statistics
【2h】

Bayesian Hierarchical Clustering for Studying Cancer Gene Expression Data with Unknown Statistics

机译:贝叶斯层次聚类研究未知统计数据的癌症基因表达数据

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Clustering analysis is an important tool in studying gene expression data. The Bayesian hierarchical clustering (BHC) algorithm can automatically infer the number of clusters and uses Bayesian model selection to improve clustering quality. In this paper, we present an extension of the BHC algorithm. Our Gaussian BHC (GBHC) algorithm represents data as a mixture of Gaussian distributions. It uses normal-gamma distribution as a conjugate prior on the mean and precision of each of the Gaussian components. We tested GBHC over 11 cancer and 3 synthetic datasets. The results on cancer datasets show that in sample clustering, GBHC on average produces a clustering partition that is more concordant with the ground truth than those obtained from other commonly used algorithms. Furthermore, GBHC frequently infers the number of clusters that is often close to the ground truth. In gene clustering, GBHC also produces a clustering partition that is more biologically plausible than several other state-of-the-art methods. This suggests GBHC as an alternative tool for studying gene expression data.The implementation of GBHC is available at
机译:聚类分析是研究基因表达数据的重要工具。贝叶斯层次聚类(BHC)算法可以自动推断聚类的数量,并使用贝叶斯模型选择来提高聚类质量。在本文中,我们提出了BHC算法的扩展。我们的高斯BHC(GBHC)算法将数据表示为高斯分布的混合。它先于每个高斯分量的均值和精度,使用正伽玛分布作为共轭。我们在11种癌症和3个合成数据集上测试了GBHC。癌症数据集上的结果表明,在样本聚类中,GBHC平均会产生一个聚类分区,该聚类分区比从其他常用算法获得的聚类分区更符合地面真实性。此外,GBHC经常推断出通常接近基本事实的簇数。在基因聚类中,GBHC还产生了一个聚类分区,该聚类分区在生物学上比其他几种最新技术更合理。这表明GBHC可作为研究基因表达数据的替代工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号