首页> 美国卫生研究院文献>other >Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables
【2h】

Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables

机译:基于惩罚模型的聚类特定对角协方差矩阵和分组变量

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Clustering analysis is one of the most widely used statistical tools in many emerging areas such as microarray data analysis. For microarray and other high-dimensional data, the presence of many noise variables may mask underlying clustering structures. Hence removing noise variables via variable selection is necessary. For simultaneous variable selection and parameter estimation, existing penalized likelihood approaches in model-based clustering analysis all assume a common diagonal covariance matrix across clusters, which however may not hold in practice. To analyze high-dimensional data, particularly those with relatively low sample sizes, this article introduces a novel approach that shrinks the variances together with means, in a more general situation with cluster-specific (diagonal) covariance matrices. Furthermore, selection of grouped variables via inclusion or exclusion of a group of variables altogether is permitted by a specific form of penalty, which facilitates incorporating subject-matter knowledge, such as gene functions in clustering microarray samples for disease subtype discovery. For implementation, EM algorithms are derived for parameter estimation, in which the M-steps clearly demonstrate the effects of shrinkage and thresholding. Numerical examples, including an application to acute leukemia subtype discovery with microarray gene expression data, are provided to demonstrate the utility and advantage of the proposed method.
机译:聚类分析是许多新兴领域(例如微阵列数据分析)中使用最广泛的统计工具之一。对于微阵列和其他高维数据,许多噪声变量的存在可能掩盖了潜在的聚类结构。因此,必须通过变量选择来消除噪声变量。对于同时进行的变量选择和参数估计,基于模型的聚类分析中现有的惩罚似然方法都假设跨聚类使用共同的对角协方差矩阵,但实际上可能不成立。为了分析高维数据,尤其是样本量相对较小的数据,本文介绍了一种新颖的方法,该方法在具有簇特定(对角)协方差矩阵的更一般情况下,将方差和均值缩小。此外,通过惩罚的特定形式允许通过完全包括或排除一组变量来选择分组变量,这有利于纳入主题知识,例如将基因功能整合到用于疾病亚型发现的微阵列样品的聚类中。为了实现,推导了EM算法进行参数估计,其中M步清楚地表明了收缩和阈值的影响。提供了数值示例,包括应用微阵列基因表达数据应用于急性白血病亚型的发现,以证明该方法的实用性和优势。

著录项

  • 期刊名称 other
  • 作者单位
  • 年(卷),期 -1(2),-1
  • 年度 -1
  • 页码 168–212
  • 总页数 49
  • 原文格式 PDF
  • 正文语种
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号