首页> 外文期刊>Biostatistics >Mixture models with multiple levels, with application to the analysis of multifactor gene expression data
【24h】

Mixture models with multiple levels, with application to the analysis of multifactor gene expression data

机译:多层次混合模型,应用于多因子基因表达数据分析

获取原文
获取原文并翻译 | 示例
           

摘要

Model-based clustering is a popular tool for summarizing high-dimensional data. With the number of high-throughput large-scale gene expression studies still on the rise, the need for effective data- summarizing tools has never been greater. By grouping genes according to a common experimental expression profile, we may gain new insight into the biological pathways that steer biological processes of interest. Clustering of gene profiles can also assist in assigning functions to genes that have not yet been functionally annotated. In this paper, we propose 2 model selection procedures for model-based clustering. Model selection in model-based clustering has to date focused on the identification of data dimensions that are relevant for clustering. However, in more complex data structures, with multiple experimental factors, such an approach does not provide easily interpreted clustering outcomes. We propose a mixture model with multiple levels, , that provides sparse representations both "within" and "between" cluster profiles. We explore various flexible "within-cluster" parameterizations and discuss how efficient parameterizations can greatly enhance the objective interpretability of the generated clusters. Moreover, we allow for a sparse "between-cluster" representation with a different number of clusters at different levels of an experimental factor of interest. This enhances interpretability of clusters generated in multiple-factor contexts. Interpretable cluster profiles can assist in detecting biologically relevant groups of genes that may be missed with less efficient parameterizations. We use our multilevel mixture model to mine a proliferating cell line expression data set for annotational context and regulatory motifs. We also investigate the performance of the multilevel clustering approach on several simulated data sets.
机译:基于模型的聚类是用于汇总高维数据的流行工具。随着高通量大规模基因表达研究的数量仍在增加,对有效数据汇总工具的需求从未如此迫切。通过根据共同的实验表达谱对基因进行分组,我们可能会获得对引导感兴趣的生物学过程的生物学途径的新见解。基因谱的聚类也可以帮助将功能分配给尚未被功能注释的基因。在本文中,我们提出了两种用于基于模型的聚类的模型选择过程。迄今为止,基于模型的聚类中的模型选择都集中在与聚类相关的数据维度的识别上。但是,在具有多个实验因素的更复杂的数据结构中,这种方法无法提供易于解释的聚类结果。我们提出了一个具有多个级别的混合模型,该模型同时提供“内部”和“之间”群集配置文件的稀疏表示。我们探索了各种灵活的“集群内”参数化,并讨论了有效的参数化如何可以大大增强所生成集群的客观可解释性。此外,我们允许在感兴趣的实验因子的不同级别使用不同数量的簇的稀疏“簇之间”表示。这增强了在多因素环境中生成的聚类的可解释性。可解释的簇概况可以帮助检测可能因效率较低的参数设置而错过的生物学相关基因组。我们使用我们的多层次混合物模型来挖掘增殖细胞系表达数据集,用于注释性背景和调控基序。我们还研究了几种模拟数据集上的多级聚类方法的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号