首页> 外文学位 >A sequential clustering algorithm with applications to gene expression data.
【24h】

A sequential clustering algorithm with applications to gene expression data.

机译:一种顺序聚类算法,适用于基因表达数据。

获取原文
获取原文并翻译 | 示例

摘要

Microarrays are part of a new class of biotechnologies which allow the monitoring of expression levels for thousands of genes simultaneously. Gene profile data come from experiments that investigate the behavior of genes over several time points. Biologists are interested in these gene expression profiles because it is believed that genes in the same functional pathway have similar profiles of gene expression.; In the analysis of data from microarray experiments, most of the unsupervised learning processes involve three steps: standardization, defining a dissimilarity measure, and applying a clustering algorithm. We will discuss the issues involved in these steps, and we will propose new methods. We will discuss the problems of current clustering algorithms and propose a new algorithm, the sequential clustering algorithm. This algorithm finds clusters sequentially based on a gaussian model. The algorithm does not require the specification of the number of clusters and allows for sporadic objects.; We will discuss a semiparametric mixture model which is motivated by the sequential clustering algorithm. Two estimators for the mixing proportion in semiparametric mixture model are proposed, and their properties are investigated using simulations.; A new dissimilarity measure that takes into account the time order and the time distance between experiments will be introduced. We will discuss the performance of various distances in clustering using the Asymptotic Discriminating Measure (ADM) and show that the new dissimilarity measure has always higher ADM than the Euclidean distance. The comparison of distances in small samples will be also discussed. We will introduce a sequential clustering algorithm with the new dissimilarity measure and investigate its performance.
机译:微阵列是一类新的生物技术的一部分,该技术允许同时监视数千种基因的表达水平。基因概况数据来自研究多个时间点上基因行为的实验。生物学家对这些基因表达谱感兴趣,因为据信在相同功能途径中的基因具有相似的基因表达谱。在分析来自微阵列实验的数据时,大多数无监督学习过程涉及三个步骤:标准化,定义相异性度量和应用聚类算法。我们将讨论这些步骤中涉及的问题,并将提出新的方法。我们将讨论当前聚类算法的问题,并提出一种新的算法,即顺序聚类算法。该算法基于高斯模型顺序查找聚类。该算法不需要指定簇的数量,并且允许出现零星的对象。我们将讨论由顺序聚类算法驱动的半参数混合模型。提出了半参数混合模型中两种混合比例的估计量,并通过仿真研究了它们的性质。将介绍一种考虑时间顺序和实验之间的时间距离的新的差异度度量。我们将使用渐近鉴别测度(ADM)讨论聚类中各种距离的性能,并表明新的相异性测度始终具有比欧几里德距离更高的ADM。还将讨论小样本中距离的比较。我们将介绍一种采用新的相似度度量的顺序聚类算法,并研究其性能。

著录项

  • 作者

    Song, Jongwoo.;

  • 作者单位

    The University of Chicago.;

  • 授予单位 The University of Chicago.;
  • 学科 Statistics.
  • 学位 Ph.D.
  • 年度 2003
  • 页码 100 p.
  • 总页数 100
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 统计学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号