Motif Discovery Through Predictive Modeling ofGene Regulation

机译：通过基因调控预测建模的主题发现

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present MEDUSA, an integrative method for learning motif models of transcription factor binding sites by incorporating promoter sequence and gene expression data. We use a modern large-margin machine learning approach, based on boosting, to enablefeature selection from the high-dimensional search space of candidate binding sequences while avoiding overfitting. At each iteration of the algorithm, MEDUSA builds a motif model whose presence in the promoter region of a gene, coupled with activity ofa regulator in an experiment, is predictive of differential expression. In this way, we learn motifs that are functional and predictive of regulatory response rather than motifs that are simply overrepresented in promoter sequences. Moreover, MEDUSA produces a model of the transcriptional control logic that can predict the expression of any gene in the organism, given the sequence of the promoter region of the target gene and the expression state of a set of known or putative transcription factors and signaling molecules. Each motif model is either a fe-length sequence, a dimer, or a PSSM that is built by agglomerative probabilistic clustering of sequences with similar boosting loss. By applying MEDUSA to a set of environmental stress response expression data in yeast, we learn motifs whose ability to predict differential expression of target genes outperforms motifs from the TRANSFAC dataset and from a previously published candidate set of PSSMs. We also show that MEDUSA retrieves many experimentallyconfirmed binding sites associated with environmental stress response from the literature.

机译：我们通过掺入启动子序列和基因表达数据，提出Medusa，一种用于学习转录因子结合位点的基质模型的一致方法。我们使用基于升压的现代大边缘机床学习方法，以实现候选绑定序列的高维搜索空间的选择，同时避免过度拟合。在算法的每次迭代时，MEDUSA构建其在基因的启动子区域中存在的基序模型，其在实验中与调节剂的活性相结合，是预测差异表达的预测性。通过这种方式，我们学习具有功能性和预测监管响应的主题，而不是在启动子序列中仅仅超越的图案。此外，Medusa产生转录控制逻辑的模型，其可以预测生物体中任何基因的表达，给定靶基因的启动子区域的序列和一组已知或推定转录因子和信号分子的表达状态。每个图案模型是Fe长序列，二聚体或PSSM，其由具有相似升压损耗的序列的附聚概率聚类构建。通过将MEDUSA应用于酵母中的一组环境应力响应表达数据，我们学习其预测目标基因的差异表达能力从Transfac数据集和先前公布的PSSMS候选PSS的差异表达的主题。我们还表明，Medusa检索许多与文献中与环境压力反应相关的绑定网站。

著录项

来源
《Annual International Conference on Research in Computational Molecular Biology》|2005年||共15页
会议地点
作者
Manuel Middendorf; Anshul Kundaje; Mihir Shah; Yoav Freund; Chris H. Wiggins; Christina Leslie;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 Q811.4-532;
关键词

相似文献

外文文献
中文文献
专利

1. Combining phylogenetic motif discovery and motif clustering to predict co-regulated genes [J] . Shane T. Jensen, Lei Shen, Jun S. Liu Bioinformatics . 2005,第20期

机译：结合系统发生的基序发现和基序聚类来预测共同调控的基因
2. Combining phylogenetic motif discovery and motif clustering to predict co-regulated genes [J] . Shane T. Jensen, Lei Shen, Jun S. Liu Bioinformatics . 2005,第20期

机译：结合系统发生基序发现和基序聚类预测共同调控的基因
3. Helping Students Understand Gene Regulation with Online Tools: A Review of MEME and Melina II, Motif Discovery Tools for Active Learning in Biology [J] . David Treves Microbiology Education Journal . 2012,第2期

机译：使用在线工具帮助学生理解基因调控：MEME和Melina II综述，用于主动学习生物学的Motif发现工具
4. Motif Discovery Through Predictive Modeling ofGene Regulation [C] . Manuel Middendorf, Anshul Kundaje, Mihir Shah, Annual International Conference on Research in Computational Molecular Biology . 2005

机译：通过基因调控预测建模的主题发现
5. Decoding gene expression regulation through motif discovery and classification [D] . Yuan, Yuan 2009

机译：通过基序发现和分类来解码基因表达调控
6. Distribution shapes govern the discovery of predictive models for gene regulation [O] . Brian Munsky, Guoliang Li, Zachary R. Fox, 2018

机译：分布形状控制着基因调控预测模型的发现
7. Motif Discovery through Predictive Modeling of Gene Regulation [O] . Middendorf, Manuel, Kundaje, Anshul, Shah, Mihir, 2007

机译：通过基因调控的预测模型发现基序

Motif Discovery Through Predictive Modeling ofGene Regulation

摘要

著录项

相似文献

相关主题

期刊订阅