【24h】

Modeling dependencies in protein-DNA binding sites

机译:建模蛋白质-DNA结合位点的依赖性

获取原文

摘要

The availability of whole genome sequences and high-throughput genomic assays opens the door for in silico analysis of transcription regulation. This includes methods for discovering and characterizing the binding sites of DNA-binding proteins, such as transcription factors. A common representation of transcription factor binding sites is a position specific score matrix (PSSM). This representation makes the strong assumption that binding site positions are independent of each other. In this work, we explore Bayesian network representations of binding sites that provide different tradeoffs between complexity (number of parameters) and the richness of dependencies between positions. We develop the formal machinery for learning such models from data and for estimating the statistical significance of putative binding sites. We then evaluate the ramifications of these richer representations in characterizing binding site motifs and predicting their genomic locations. We show thatthese richer representations improve over the PSSM model in both tasks.
机译:全基因组序列和高通量基因组测定的可用性为转录调控的计算机分析打开了大门。这包括发现和表征DNA结合蛋白(例如转录因子)结合位点的方法。转录因子结合位点的常见代表是位置特异性得分矩阵(PSSM)。该表示法强有力地假设了结合位点位置彼此独立。在这项工作中,我们探索绑定站点的贝叶斯网络表示形式,该表示形式在复杂度(参数数量)和位置之间的依存关系丰富程度之间提供了不同的权衡。我们开发了正式的机制,可以从数据中学习此类模型并估算推定的结合位点的统计学意义。然后,我们在表征结合位点基序并预测其基因组位置时,评估了这些更丰富的表示形式所产生的影响。我们证明,在这两个任务中,这些更丰富的表示形式都优于PSSM模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号