首页> 外文期刊>Bioinformatics >Clustering threshold gradient descent regularization: with applications to microarray studies
【24h】

Clustering threshold gradient descent regularization: with applications to microarray studies

机译:聚类阈值梯度下降正则化:在微阵列研究中的应用

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: An important goal of microarray studies is to discover genes that are associated with clinical outcomes, such as disease status and patient survival. While a typical experiment surveys gene expressions on a global scale, there may be only a small number of genes that have significant influence on a clinical outcome. Moreover, expression data have cluster structures and the genes within a cluster have correlated expressions and coordinated functions, but the effects of individual genes in the same cluster may be different. Accordingly, we seek to build statistical models with the following properties. First, the model is sparse in the sense that only a subset of the parameter vector is non-zero. Second, the cluster structures of gene expressions are properly accounted for. Results: For gene expression data without pathway information, we divide genes into clusters using commonly used methods, such as K-means or hierarchical approaches. The optimal number of clusters is determined using the Gap statistic. We propose aclustering threshold gradient descent regularization (CTGDR) method, for simultaneous cluster selection and within cluster gene selection.Weapply this method to binary classification and censored survival analysis. Compared to the standard TGDR and other regularization methods, the CTGDR takes into account the cluster structure and carries out feature selection at both the cluster level and within-cluster gene level. We demonstrate the CTGDR on two studies of cancer classification and two studies correlating survival of lymphoma patients with microarray expressions.Availability: R code is available upon request.
机译:动机:微阵列研究的一个重要目标是发现与临床结果相关的基因,例如疾病状态和患者存活率。尽管一个典型的实验在全球范围内调查基因表达,但可能只有少数基因对临床结果产生重大影响。而且,表达数据具有簇结构,并且簇内的基因具有相关的表达和协调的功能,但是同一簇中单个基因的作用可能不同。因此,我们寻求建立具有以下特性的统计模型。首先,在仅参数向量的子集为非零的意义上,模型是稀疏的。其次,正确解释基因表达的簇结构。结果:对于没有途径信息的基因表达数据,我们使用常用方法(例如K均值或分层方法)将基因分为簇。使用Gap统计量确定最佳聚类数。我们提出了同时进行聚类选择和聚类基因内选择的聚集阈值梯度下降正则化(CTGDR)方法。将该方法应用于二元分类和删失生存分析。与标准TGDR和其他正则化方法相比,CTGDR考虑了聚类结构并在聚类级别和聚类内基因级别都进行了特征选择。我们在两项癌症分类研究和两项与淋巴瘤患者生存率与微阵列表达相关的研究中证明了CTGDR。可用性:R代码可根据要求提供。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号