首页> 外文会议>IEEE International Conference on Data Science and Advanced Analytics >Generalized Bayesian Factor Analysis for Integrative Clustering with Applications to Multi-Omics Data
【24h】

Generalized Bayesian Factor Analysis for Integrative Clustering with Applications to Multi-Omics Data

机译:集成聚类的广义贝叶斯因子分析及其在多组学数据中的应用

获取原文

摘要

Integrative clustering is a clustering approach for multiple datasets, which provide different views of a common group of subjects. It enables analyzing multi-omics data jointly to, for example, identify the subtypes of diseases, cells, and so on, capturing the complex underlying biological processes more precisely. On the other hand, there has been a great deal of interest in incorporating the prior structural knowledge on the features into statistical analyses over the past decade. The knowledge on the gene regulatory network (pathways) can potentially be incorporated into many genomic studies. In this paper, we propose a novel integrative clustering method which can incorporate the prior graph knowledge. We first develop a generalized Bayesian factor analysis (GBFA) framework, a sparse Bayesian factor analysis which can take into account the graph information. Our GBFA framework employs the spike and slab lasso (SSL) prior to impose sparsity on the factor loadings and the Markov random field (MRF) prior to encourage smoothing over the adjacent factor loadings, which establishes a unified shrinkage adaptive to the loading size and the graph structure. Then, we use the framework to extend iCluster+, a factor analysis based integrative clustering approach. A novel variational EM algorithm is proposed to efficiently estimate the MAP estimator for the factor loadings. Extensive simulation studies and the application to the NCI60 cell line dataset demonstrate that the propose method is superior and delivers more biologically meaningful outcomes.
机译:集成聚类是针对多个数据集的聚类方法,这些数据集提供了一组常见主题的不同视图。它可以联合分析多组学数据,例如,识别疾病,细胞等的亚型,从而更精确地捕获复杂的基础生物学过程。另一方面,在过去十年中,将特征的现有结构知识纳入统计分析引起了极大的兴趣。有关基因调控网络(途径)的知识可能会整合到许多基因组研究中。在本文中,我们提出了一种新颖的集成聚类方法,该方法可以融合先验图知识。我们首先开发一个广义的贝叶斯因子分析(GBFA)框架,这是一种可以考虑图形信息的稀疏贝叶斯因子分析。我们的GBFA框架先采用尖峰和平板套索(SSL),然后再对因子载荷施加稀疏性,然后采用Markov随机场(MRF)来鼓励对相邻因子载荷进行平滑处理,从而建立适应载荷大小和载荷的统一收缩率。图结构。然后,我们使用该框架扩展iCluster +,这是一种基于因子分析的集成聚类方法。提出了一种新颖的变分EM算法来有效估计因子负荷的MAP估计量。大量的模拟研究及其在NCI60细胞系数据集中的应用表明,该方法具有优越性,并具有更有意义的生物学意义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号