首页> 外文学位 >Microarray analysis: Choice of metric, new clustering algorithm and identification of transcription factors.
【24h】

Microarray analysis: Choice of metric, new clustering algorithm and identification of transcription factors.

机译:芯片分析:度量标准的选择,新的聚类算法和转录因子的鉴定。

获取原文
获取原文并翻译 | 示例

摘要

There are statistical algorithms that combine microarray expression data and genome sequence data to successfully identify the transcription factor (TF) bindings motifs in the low eukaryotic genome. In higher eukaryotes, however, finding TF binding sites is currently a challenge. Gene expression clusters found by classical methods often do not lead to successful identification of the TF bindings sites. We think current lack of success comes from three aspects.; First difficulty is in locating the relevant motifs in the promoter regions. In high eukaryotes, TF binding sites, often working in combinations, could appear in far upstream (e.g., 20,000 bases upstream from transcription starting site), in introns and even in downstream regions. However, more advanced methods for cis-regulatory analysis, e.g. using cross-species comparison, are being developed. Second difficulty is the low specificity of co-expressed genes identified by microarray analysis. We observe few identified co-expressed genes are, in fact, co-regulated. Part of the reason is the lack of performances of metrics between expression profiles and clustering algorithms which find co-expressed genes in an automated way. Third difficulty is to combine microarray analysis and cis-regulatory analysis. Regression approaches are less likely to work since the number of regulated genes is small compared to the whole genome. Other approaches, directly searching for the motifs in the promoter regions of co-expressed genes, are not promising because we observe that tightly co-expressed genes often share little known TF binding motifs in promoter regions.; In chapter one, we propose a new metric between mRNA expression profiles that correlates better with the regulatory distance than widely used metrics such as correlation or cosine correlation. In chapter two, we propose a clustering algorithm that uses repeated sub-sampling to distinguish candidate clusters and scattered genes and also require each cluster to maintain quality in original feature distances. High specificity of clusters are validated through simulations studies. In chapter three, we apply new metric and clustering algorithm to microarray data and propose a new approach to combine the result with cis-regulatory analysis to identify relevant transcription factors.
机译:有一些统计算法可以将微阵列表达数据和基因组序列数据结合起来,从而成功地识别出低真核生物基因组中的转录因子(TF)结合基序。然而,在高级真核生物中,寻找TF结合位点目前是一个挑战。通过经典方法发现的基因表达簇通常不能成功鉴定TF结合位点。我们认为当前缺乏成功的原因来自三个方面。第一个困难是在启动子区域中定位相关基序。在高等真核生物中,通常以组合方式起作用的TF结合位点可能出现在很远的上游(例如,转录起始位点上游的20,000个碱基),内含子甚至下游区域。然而,更高级的顺式调节分析方法,例如正在使用跨物种比较。第二个困难是通过微阵列分析鉴定的共表达基因的特异性低。我们观察到几乎没有发现共同表达的基因实际上是共同调控的。部分原因是表达谱和聚类算法之间缺乏以自动方式找到共表达基因的度量标准的性能。第三个困难是将微阵列分析与顺式调节分析相结合。回归方法不太可能起作用,因为受调控的基因数量比整个基因组少。直接寻找共表达基因的启动子区域中的基序的其他方法没有希望,因为我们观察到紧密共表达的基因通常在启动子区域中共享鲜为人知的TF结合基序。在第一章中,我们提出了一种在mRNA表达谱之间的新指标,该指标与调节距离的相关性比广泛使用的相关性或余弦相关性更好。在第二章中,我们提出了一种聚类算法,该算法使用重复的子采样来区分候选聚类和分散的基因,并且还要求每个聚类在原始特征距离上保持质量。通过模拟研究验证了簇的高度特异性。在第三章中,我们将新的度量和聚类算法应用于微阵列数据,并提出了一种将结果与顺式调控分析相结合以识别相关转录因子的新方法。

著录项

  • 作者

    Kim, Ryung Suk.;

  • 作者单位

    Harvard University.;

  • 授予单位 Harvard University.;
  • 学科 Biology Biostatistics.; Biology Molecular.
  • 学位 Ph.D.
  • 年度 2005
  • 页码 62 p.
  • 总页数 62
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 生物数学方法;分子遗传学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号