首页> 外文学位 >Statistical methods and software for ChIP-Seq data analysis.
【24h】

Statistical methods and software for ChIP-Seq data analysis.

机译:ChIP-Seq数据分析的统计方法和软件。

获取原文
获取原文并翻译 | 示例

摘要

Chromatin immunoprecipitation followed by high throughput sequencing (ChIP-Seq) has been successfully used for genome-wide profiling of transcription factor binding sites, histone modifications, and nucleosome occupancy in many model organisms and humans. This thesis focuses on developing statistical methodologies and software to analyze ChIP-Seq data in an unbiased way.;This thesis is composed of three major parts. In the first part, we discuss statistical challenges in identification of binding events in repetitive regions. The state of the art for analyzing ChIP-Seq data relies only on using reads that map uniquely to a relevant reference genome (uni-reads). We developed CSEM, a general statistical approach for utilizing reads that map to multiple locations on the reference genome (multi-reads). Our computational and experimental results establish that multi-reads can be of critical importance for studying transcription factor binding in highly repetitive regions of genomes with ChIP-Seq experiments.;In the second part, we investigate statistical challenges in identification of closely spaced binding events. Because the compact prokaryotic genomes harbor binding sites some of which are separated by only a few base pairs, applications of ChIP-Seq in this domain have not reached their full potential. Although paired-end tag (PET) assay enables higher resolution identification of binding events than single-end tag (SET) assay, standard ChIP-Seq analysis methods are not equipped to utilize PET-specific features of the data. To address this problem, we developed dPeak, a high resolution binding site identification algorithm, that is applicable with PET and SET data. Our computational and experimental results show that when coupled with PET data, dPeak can identify closely spaced binding sites with high accuracy.;In the third part, we describe our three novel ChIP-Seq data analysis software, csem, mosaics, and dpeak. These three software address each of three important problems in ChIP-Seq data analysis, which are identification of binding events in repetitive regions, consideration of important sequence biases in peak calling, and identification of closely spaced binding events, respectively. Through applications to real ChIP-Seq data, we illustrate how these software can reveal novel biological insights that are currently ignored in standard ChIP-Seq data analysis.
机译:染色质免疫沉淀后再进行高通量测序(ChIP-Seq)已成功用于许多模型生物和人类中转录因子结合位点,组蛋白修饰和核小体占据的全基因组概况分析。本文的重点是开发统计方法和软件,以无偏见的方式分析ChIP-Seq数据。本论文由三个主要部分组成。在第一部分中,我们讨论了识别重复区域中结合事件的统计挑战。用于分析ChIP-Seq数据的最新技术仅依赖于使用唯一映射到相关参考基因组的读段(单读段)。我们开发了CSEM,这是一种通用的统计方法,用于利用映射到参考基因组上多个位置的读数(多个读数)。我们的计算和实验结果表明,多重阅读对于通过ChIP-Seq实验研究基因组高度重复区域中的转录因子结合至关重要。第二部分,我们研究了在鉴定紧密结合事件中的统计学挑战。由于紧凑的原核生物基因组具有结合位点,其中一些仅被几个碱基对隔开,因此ChIP-Seq在该结构域中的应用尚未充分发挥其潜力。尽管双末端标签(PET)分析比单末端标签(SET)分析能够更高分辨率地鉴定结合事件,但标准ChIP-Seq分析方法并未配备以利用PET特定数据特征。为了解决这个问题,我们开发了dPeak,这是一种高分辨率的结合位点识别算法,适用于PET和SET数据。我们的计算和实验结果表明,当与PET数据结合使用时,dPeak可以高精度地识别紧密间隔的结合位点。第三部分,我们描述了我们的三种新颖的ChIP-Seq数据分析软件csem,mosaic和dpeak。这三个软件分别解决了ChIP-Seq数据分析中的三个重要问题,分别是识别重复区域中的结合事件,考虑峰调用中重要的序列偏倚以及识别紧密间隔的结合事件。通过应用到实际的ChIP-Seq数据中,我们说明了这些软件如何揭示新颖的生物学见解,而当前在标准ChIP-Seq数据分析中却被忽略。

著录项

  • 作者

    Chung, Dongjun.;

  • 作者单位

    The University of Wisconsin - Madison.;

  • 授予单位 The University of Wisconsin - Madison.;
  • 学科 Statistics.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 124 p.
  • 总页数 124
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号