首页> 外文学位 >Methods for cluster analysis and validation in microarray gene expression data.
【24h】

Methods for cluster analysis and validation in microarray gene expression data.

机译:在微阵列基因表达数据中进行聚类分析和验证的方法。

获取原文
获取原文并翻译 | 示例

摘要

Motivation. Unsupervised learning or clustering is frequently used to explore gene expression profiles for insight into both regulation and function. However, the quality of clustering results is often difficult to assess and each algorithm has tunable parameters with often no obvious way to choose appropriate values. Most algorithms also require the number of clusters to be predetermined yet this value is rarely known and, thus, is arrived at by subjective criteria. Here we present a method to systematically address these challenges using statistical evaluation.; Method. The method presented compares the quality of clustering results in order to choose the most appropriate algorithm, distance metric and number of clusters for gene network discovery using objective criteria. In brief, two quality assessment metrics are used: the Consensus Share (CS) and the Feature Configuration Statistic (FCS). CS is the percentage of genes (not gene pairs) that are identically clustered in several clusterings and FCS is a measure of randomness of the observed configuration of transcription factor binding sites among clustered genes.; Results. We evaluate this method using both artificial and yeast microarray data. By choosing parameters settings that minimize FCS values and maximize CS values we show major advantages over other clustering methods in particular for identifying combinatorially regulated groups of genes. The results produced provide remarkable enrichment for cis-regulatory elements in clusters of genes known to be regulated by such elements and evidence of extensive combinatorial regulation. Moreover, the method can be generalized when prior information about cis-regulatory sites is absent or it is desirable to calculate FCS values based on functional categorization.
机译:动机。无监督学习或聚类经常用于探索基因表达谱,以了解调节和功能。但是,聚类结果的质量通常很难评估,并且每种算法都有可调参数,通常没有明显的方法来选择合适的值。大多数算法还要求预先确定簇的数量,但是这个值很少知道,因此是通过主观标准得出的。在这里,我们提出一种使用统计评估系统地应对这些挑战的方法。方法。提出的方法比较了聚类结果的质量,以便使用客观标准为基因网络发现选择最合适的算法,距离度量和聚类数目。简而言之,使用了两个质量评估指标:共识份额(CS)和功能配置统计(FCS)。 CS是在几个聚类中相同聚类的基因(不是基因对)的百分比,FCS是在聚类基因中观察到的转录因子结合位点构型的随机性的量度。结果。我们使用人工和酵母微阵列数据评估该方法。通过选择最小化FCS值和最大化CS值的参数设置,我们显示出优于其他聚类方法的主要优势,特别是在识别组合调控的基因组方面。产生的结果为已知由此类元件调控的基因簇中的顺式调控元件提供了显着的富集,并提供了广泛的组合调控的证据。而且,当不存在关于顺式调节位点的先验信息或者希望基于功能分类来计算FCS值时,该方法可以被推广。

著录项

  • 作者单位

    University of Illinois at Urbana-Champaign.;

  • 授予单位 University of Illinois at Urbana-Champaign.;
  • 学科 Biology Bioinformatics.; Computer Science.
  • 学位 Ph.D.
  • 年度 2006
  • 页码 103 p.
  • 总页数 103
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号