首页> 外文期刊>Bioinformatics >A Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length
【24h】

A Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length

机译:Gibbs采样器,用于识别对称结构的,间隔开的DNA图案,并改善了信号长度的估计

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: Transcription regulatory protein factors often bind DNA as homo-dimers or hetero-dimers. Thus they recognize structured DNA motifs that are inverted or direct repeats or spaced motif pairs. However, these motifs are often difficult to identify owing to their high divergence. The motif structure included explicitly into the motif recognition algorithm improves recognition efficiency for highly divergent motifs as well as estimation of motif geometric parameters.Result: We present a modification of the Gibbs sampling motif extraction algorithm, SeSiMCMC (Sequence Similarities by Markov Chain Monte Carlo), which finds structured motifs of these types, as well as non-structured motifs, in a set of unaligned DNA sequences. It employs improved estimators of motif and spacer lengths. The probability that a sequence does not contain any motif is accounted for in a rigorous Bayesian manner. We have applied the algorithm to a set of upstream regions of genes from two Escherichia coli regulons involved in respiration. We have demonstrated that accounting for a symmetric motif structure allows the algorithm to identify weak motifs more accurately. In the examples studied, ArcA binding sites were demonstrated to have the structure of a direct spaced repeat, whereas NarP binding sites exhibited the palindromic structure.
机译:动机:转录调节蛋白因子通常以同二聚体或异二聚体的形式结合DNA。因此,它们识别出反向或直接重复或间隔的基序对的结构化DNA基序。但是,由于它们的高度差异,这些图案通常很难识别。明确地包含在主题识别算法中的主题结构提高了对高度发散的主题的识别效率,并提高了主题几何参数的估计。结果:我们提出了对Gibbs采样主题提取算法SeSiMCMC(Markov Chain Monte Carlo的序列相似性)的改进,可以在一组未比对的DNA序列中找到这些类型的结构化基元以及非结构化基元。它采用了改进的图案和间隔长度估计器。序列不包含任何基序的可能性以严格的贝叶斯方式解决。我们已经将该算法应用于来自涉及呼吸的两个大肠杆菌调节子的基因的上游区域。我们已经证明,考虑对称图案结构可以使算法更准确地识别弱图案。在研究的实例中,证明了ArcA结合位点具有直接间隔的重复序列的结构,而NarP结合位点则表现出回文结构。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号