DISCOVER: a feature-based discriminative method for motif search in complex genomes

机译：发现：复杂基因组中基于特征的基于主题搜索的判别方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Motivation: Identifying transcription factor binding sites (TFBSs) encoding complex regulatory signals in metazoan genomes remains a challenging problem in computational genomics. Due to degeneracy of nucleotide content among binding site instances or motifs, and Intricate 'grammatical organization' of motifs within cis-regulatory modules (CRMs), extant pattern matching-based in silico motif search methods often suffer from impractically high false positive rates, especially in the context of analyzing large genomic datasets, and noisy position weight matrices which characterize binding sites: Here, we try to address this problem by using a framework to maximally utilize the information content of the genomic DNA in the region of query, taking cues from values of various biologically meaningful genetic and epigenetic factors in the query region such as clade-specific evolutionary parameters, presence/absence of nearby coding regions, etc. We present a new method for TFBS prediction in metazoan genomes that utilizes both the CRM architecture of sequences and a variety of features of individual motifs. Our proposed approach is based on a discriminative probabilistic model known as conditional random fields that explicitly optimizes the predictive probability of motif presence in large sequences, based on the joint effect of all such features. Results: This model overcomes weaknesses in earlier methods based on less effective statistical formalisms that are sensitive to spurious signals in the data. We evaluate otir method on both simulated CRMs and real Drosophila sequences in comparison with a wide spectrum of existing models, and outperform the state of the art by 22% in Fl score.

机译：刺激：鉴定编码在甲卓类基因组中的复杂调节信号的转录因子结合位点（TFBS）仍然是计算基因组学的具有挑战性问题。由于结合位点实例或图案中的核苷酸含量的退化，以及CIS-Charmatory模块（CRMS）内的基序的复杂性“语法组织”，基于Silico Motif搜索方法的现存模式匹配通常遭受不切实际的假阳性率，特别是在分析大型基因组数据集的上下文中，并且嘈杂的位置重量矩阵表征绑定站点的：这里，我们尝试通过使用框架来解决这个问题来最大限度地利用查询区域中基因组DNA的信息内容，从而从在查询区域中的各种生物学有意义的遗传和表观遗传因子的价值，例如思想思想的进化参数，存在/不存在，附近编码区等。我们为使用序列的CRM架构进行了CRM架构的Metazoan基因组中的TFBS预测的新方法以及各种图案的各种特征。我们所提出的方法基于称为条件随机字段的判别概率模型，该模型基于所有这些特征的关节效应，明确地优化了大序列中的基序存在的预测概率。结果：该型号基于较少有效的统计形式主义克服了早期方法的缺点，这些方法对数据中的虚假信号敏感。我们在模拟CRM和真实果蝇序列中评估OTIR方法，与广谱的现有模型相比，并且优于本领域的状态，以22％的FL得分。

著录项

来源
《Intelligent Systems for Molecular Biology》|2009年||共13页
会议地点
作者
Wenjie Fu; Pradipta Ray; Eric P Xing;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 Q7-532;
关键词
position; matrices; epigenetic;

机译：位置;矩阵;表观遗传;

相似文献

外文文献
中文文献
专利

1. DISCOVER: a feature-based discriminative method for motif search in complex genomes [J] . Fu Wenjie, Ray Pradipta, Xing Eric P. Bioinformatics . 2009,第12期

机译：发现：复杂基因组中基于特征的判别方法，用于基序搜索
2. DISCOVER: a feature-based discriminative method for motif search in complex genomes [J] . Wenjie Fu† Pradipta Ray† and Eric P. Xing* Bioinformatics . 2009,第12期

机译：发现：复杂基因组中基于特征的判别方法，用于基序搜索
3. COMPASSS (COMplex PAttern of Sequence Search Software), a simple and effective tool for mining complex motifs in whole genomes [J] . Maccari Giuseppe, Gemignani Federica, Landi Stefano Bioinformatics . 2010,第14期

机译：COMPASS（序列搜索软件的复杂模式），一种在整个基因组中挖掘复杂基序的简单有效的工具
4. DISCOVER: a feature-based discriminative method for motif search in complex genomes [C] . Wenjie Fu, Pradipta Ray, Eric P Xing Intelligent Systems for Molecular Biology . 2009

机译：发现：复杂基因组中基于特征的基于主题搜索的判别方法
5. Computational identification of discriminative sequence motifs with dynamic search spaces. [D] . Karnik, Rahul. 2012

机译：具有动态搜索空间的区分性序列基元的计算识别。
6. DISCOVER: a feature-based discriminative method for motif search in complex genomes [O] . Wenjie Fu, Pradipta Ray, Eric P. Xing -1

机译：发现：复杂基因组中基于特征的判别方法用于基序搜索
7. DISCOVER: A feature-based discriminative method for motif search in complex genomes [O] . Wenjie Fu, Pradipta Ray, Eric P. Xing 2013

机译：发现：一种基于特征的复杂基因组中基序搜索的判别方法

DISCOVER: a feature-based discriminative method for motif search in complex genomes

摘要

著录项

相似文献

相关主题

期刊订阅