首页> 美国卫生研究院文献>PLoS Computational Biology >Automated incorporation of pairwise dependency in transcription factor binding site prediction using dinucleotide weight tensors
【2h】

Automated incorporation of pairwise dependency in transcription factor binding site prediction using dinucleotide weight tensors

机译:使用二核苷酸权重张量自动将成对依赖性纳入转录因子结合位点预测

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Gene regulatory networks are ultimately encoded by the sequence-specific binding of (TFs) to short DNA segments. Although it is customary to represent the binding specificity of a TF by a position-specific weight matrix (PSWM), which assumes each position within a site contributes independently to the overall binding affinity, evidence has been accumulating that there can be significant dependencies between positions. Unfortunately, methodological challenges have so far hindered the development of a practical and generally-accepted extension of the PSWM model. On the one hand, simple models that only consider dependencies between nearest-neighbor positions are easy to use in practice, but fail to account for the distal dependencies that are observed in the data. On the other hand, models that allow for arbitrary dependencies are prone to overfitting, requiring regularization schemes that are difficult to use in practice for non-experts. Here we present a new regulatory motif model, called dinucleotide weight tensor (DWT), that incorporates arbitrary pairwise dependencies between positions in binding sites, rigorously from first principles, and free from tunable parameters. We demonstrate the power of the method on a large set of ChIP-seq data-sets, showing that DWTs outperform both PSWMs and motif models that only incorporate nearest-neighbor dependencies. We also demonstrate that DWTs outperform two previously proposed methods. Finally, we show that DWTs inferred from ChIP-seq data also outperform PSWMs on HT-SELEX data for the same TF, suggesting that DWTs capture inherent biophysical properties of the interactions between the DNA binding domains of TFs and their binding sites. We make a suite of DWT tools available at , that allow users to automatically perform ‘motif finding’, i.e. the inference of DWT motifs from a set of sequences, binding site prediction with DWTs, and visualization of DWT ‘dilogo’ motifs.
机译:基因调节网络最终由(TF)与短DNA片段的序列特异性结合编码。尽管习惯上用位置特异性权重矩阵(PSWM)表示TF的结合特异性,该假设假定位点内的每个位置均独立于总结合亲和力,但证据表明,位置之间可能存在显着依赖性。不幸的是,到目前为止,方法上的挑战阻碍了PSWM模型的实用和普遍接受的扩展的发展。一方面,仅考虑最邻近位置之间的相关性的简单模型在实践中易于使用,但无法解释数据中观察到的远侧相关性。另一方面,允许任意依赖的模型容易过度拟合,因此需要非专家在实践中难以使用的正则化方案。在这里,我们提出了一个新的调控基序模型,称为二核苷酸重量张量(DWT),该模型严格结合了第一个原理,并且没有可调参数,在结合位点的位置之间引入了任意成对依赖性。我们在大量的ChIP-seq数据集上展示了该方法的强大功能,表明DWT的性能优于PSWM和仅包含最近邻依赖性的主题模型。我们还证明了DWT的性能优于之前提出的两种方法。最后,我们显示,对于相同的TF,从ChIP-seq数据推断出的DWT还要优于HT-SELEX数据上的PSWM,这表明DWT捕获了TF的DNA结合域与其结合位点之间相互作用的固有生物物理特性。我们在提供了一套DWT工具,使用户可以自动执行“基元查找”,即从一系列序列中推断出DWT主题,与DWT结合的位点预测以及DWT的“ dilogo”主题的可视化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号