首页> 外文会议>Annual International Conference on Research in Computational Molecular Biology >Motif Discovery Through Predictive Modeling ofGene Regulation
【24h】

Motif Discovery Through Predictive Modeling ofGene Regulation

机译:通过基因调控预测建模的主题发现

获取原文

摘要

We present MEDUSA, an integrative method for learning motif models of transcription factor binding sites by incorporating promoter sequence and gene expression data. We use a modern large-margin machine learning approach, based on boosting, to enablefeature selection from the high-dimensional search space of candidate binding sequences while avoiding overfitting. At each iteration of the algorithm, MEDUSA builds a motif model whose presence in the promoter region of a gene, coupled with activity ofa regulator in an experiment, is predictive of differential expression. In this way, we learn motifs that are functional and predictive of regulatory response rather than motifs that are simply overrepresented in promoter sequences. Moreover, MEDUSA produces a model of the transcriptional control logic that can predict the expression of any gene in the organism, given the sequence of the promoter region of the target gene and the expression state of a set of known or putative transcription factors and signaling molecules. Each motif model is either a fe-length sequence, a dimer, or a PSSM that is built by agglomerative probabilistic clustering of sequences with similar boosting loss. By applying MEDUSA to a set of environmental stress response expression data in yeast, we learn motifs whose ability to predict differential expression of target genes outperforms motifs from the TRANSFAC dataset and from a previously published candidate set of PSSMs. We also show that MEDUSA retrieves many experimentallyconfirmed binding sites associated with environmental stress response from the literature.
机译:我们通过掺入启动子序列和基因表达数据,提出Medusa,一种用于学习转录因子结合位点的基质模型的一致方法。我们使用基于升压的现代大边缘机床学习方法,以实现候选绑定序列的高维搜索空间的选择,同时避免过度拟合。在算法的每次迭代时,MEDUSA构建其在基因的启动子区域中存在的基序模型,其在实验中与调节剂的活性相结合,是预测差异表达的预测性。通过这种方式,我们学习具有功能性和预测监管响应的主题,而不是在启动子序列中仅仅超越的图案。此外,Medusa产生转录控制逻辑的模型,其可以预测生物体中任何基因的表达,给定靶基因的启动子区域的序列和一组已知或推定转录因子和信号分子的表达状态。每个图案模型是Fe长序列,二聚体或PSSM,其由具有相似升压损耗的序列的附聚概率聚类构建。通过将MEDUSA应用于酵母中的一组环境应力响应表达数据,我们学习其预测目标基因的差异表达能力从Transfac数据集和先​​前公布的PSSMS候选PSS的差异表达的主题。我们还表明,Medusa检索许多与文献中与环境压力反应相关的绑定网站。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号