首页> 外文期刊>Journal of Bioinformatics and Computational Biology >From binding motifs in chip-seq data to improved models of transcription factor binding sites
【24h】

From binding motifs in chip-seq data to improved models of transcription factor binding sites

机译:从芯片序列数据中的结合基序到转录因子结合位点的改进模型

获取原文
获取原文并翻译 | 示例
           

摘要

Chromatin immunoprecipitation followed by deep sequencing (ChIP-Seq) became a method of choice to locate DNA segments bound by different regulatory proteins. ChIP-Seq produces extremely valuable information to study transcriptional regulation. The wet-lab workflow is often supported by downstream computational analysis including construction of models of nucleotide sequences of transcription factor binding sites in DNA, which can be used to detect binding sites in ChIP-Seq data at a single base pair resolution. The most popular TFBS model is represented by positional weight matrix (PWM) with statistically independent positional weights of nucleotides in different columns; such PWMs are constructed from a gapless multiple local alignment of sequences containing experimentally identified TFBSs. Modern high-throughput techniques, including ChIP-Seq, provide enough data for careful training of advanced models containing more parameters than PWM. Yet, many suggested multiparametric models often provide only incremental improvement of TFBS recognition quality comparing to traditional PWMs trained on ChIP-Seq data. We present a novel computational tool, diChIPMunk, that constructs TFBS models as optimal dinucleotide PWMs, thus accounting for correlations between nucleotides neighboring in input sequences. diChIPMunk utilizes many advantages of ChIPMunk, its ancestor algorithm, accounting for ChIP-Seq base coverage profiles (?peak shape?) and using the effective subsampling-based core procedure which allows processing of large datasets. We demonstrate that diPWMs constructed by diChIPMunk outperform traditional PWMs constructed by ChIPMunk from the same ChIP-Seq data. Software website:
机译:染色质免疫沉淀后再进行深度测序(ChIP-Seq)成为定位由不同调节蛋白结合的DNA片段的一种选择方法。 ChIP-Seq可产生极有价值的信息,以研究转录调控。湿实验室工作流程通常受到下游计算分析的支持,包括构建DNA中转录因子结合位点的核苷酸序列模型的模型,该模型可用于检测ChIP-Seq数据中单个碱基对分辨率的结合位点。最受欢迎的TFBS模型由位置权重矩阵(PWM)表示,在不同列中核苷酸的统计位置无关。此类PWM由包含实验识别的TFBS的序列的无缝多个局部比对构成。包括ChIP-Seq在内的现代高通量技术可提供足够的数据,用于仔细训练比PWM包含更多参数的高级模型。然而,与在ChIP-Seq数据上训练的传统PWM相比,许多建议的多参数模型通常仅提供TFBS识别质量的增量改进。我们提出了一种新颖的计算工具diChIPMunk,该工具将TFBS模型构建为最佳的二核苷酸PWM,从而解决了输入序列中相邻核苷酸之间的相关性。 diChIPMunk利用了其祖先算法ChIPMunk的许多优点,解决了ChIP-Seq基本覆盖范围(“峰形”)的问题,并使用了基于子采样的有效核心程序,该程序可处理大型数据集。我们证明,由diChIPMunk构建的diPWM优于由ChIPMunk从相同的ChIP-Seq数据构建的传统PWM。软件网站:

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号