首页> 美国卫生研究院文献>Journal of Computational Biology >AREM: Aligning Short Reads from ChIP-Sequencing by Expectation Maximization
【2h】

AREM: Aligning Short Reads from ChIP-Sequencing by Expectation Maximization

机译:AREM:通过期望最大化来对齐来自ChIP测序的短读

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

High-throughput sequencing coupled to chromatin immunoprecipitation (ChIP-Seq) is widely used in characterizing genome-wide binding patterns of transcription factors, cofactors, chromatin modifiers, and other DNA binding proteins. A key step in ChIP-Seq data analysis is to map short reads from high-throughput sequencing to a reference genome and identify peak regions enriched with short reads. Although several methods have been proposed for ChIP-Seq analysis, most existing methods only consider reads that can be uniquely placed in the reference genome, and therefore have low power for detecting peaks located within repeat sequences. Here, we introduce a probabilistic approach for ChIP-Seq data analysis that utilizes all reads, providing a truly genome-wide view of binding patterns. Reads are modeled using a mixture model corresponding to K enriched regions and a null genomic background. We use maximum likelihood to estimate the locations of the enriched regions, and implement an expectation-maximization (E-M) algorithm, called AREM (aligning reads by expectation maximization), to update the alignment probabilities of each read to different genomic locations. We apply the algorithm to identify genome-wide binding events of two proteins: Rad21, a component of cohesin and a key factor involved in chromatid cohesion, and Srebp-1, a transcription factor important for lipid/cholesterol homeostasis. Using AREM, we were able to identify 19,935 Rad21 peaks and 1,748 Srebp-1 peaks in the mouse genome with high confidence, including 1,517 (7.6%) Rad21 peaks and 227 (13%) Srebp-1 peaks that were missed using only uniquely mapped reads. The open source implementation of our algorithm is available at .
机译:高通量测序与染色质免疫沉淀(ChIP-Seq)结合,广泛用于表征转录因子,辅因子,染色质修饰剂和其他DNA结合蛋白的全基因组结合模式。 ChIP-Seq数据分析的关键步骤是将高通量测序的短读图映射到参考基因组,并鉴定富含短读的峰区域。尽管已经提出了几种用于ChIP-Seq分析的方法,但是大多数现有方法仅考虑可以唯一地放置在参考基因组中的读段,因此对于检测位于重复序列内的峰具有较低的功效。在这里,我们介绍了一种利用所有读数进行ChIP-Seq数据分析的概率方法,提供了全基因组范围内结合模式的真实视图。使用对应于K个富集区域和无效基因组背景的混合模型对读取进行建模。我们使用最大似然来估计富集区域的位置,并实施称为AREM(通过期望最大化对齐读取)的期望最大化(E-M)算法,以将每个读取的对齐概率更新为不同的基因组位置。我们应用该算法来识别两种蛋白质的全基因组结合事件:Rad21,黏着蛋白的一个组成部分,参与染色单体凝聚的一个关键因素,Srebp-1,一个对脂质/胆固醇动态平衡很重要的转录因子。使用AREM,我们能够以高可信度鉴定小鼠基因组中的19935个Rad21峰和1748个Srebp-1峰,包括仅使用唯一图谱就错过了的1517个(7.6%)Rad21峰和227个(13%)Srebp-1峰读。我们的算法的开源实现可在上找到。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号