GADEM: A Genetic Algorithm Guided Formation of Spaced Dyads Coupled with an EM Algorithm for Motif Discovery

Leping Li

首页> 外文期刊>Journal of Computational Biology >GADEM: A Genetic Algorithm Guided Formation of Spaced Dyads Coupled with an EM Algorithm for Motif Discovery

【24h】

GADEM: A Genetic Algorithm Guided Formation of Spaced Dyads Coupled with an EM Algorithm for Motif Discovery

机译：GADEM：遗传算法指导间隔的二元组的形成与EM算法相结合的主题发现

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Genome-wide analyses of protein binding sites generate large amounts of data; a ChIP dataset might contain 10,000 sites. Unbiased motif discovery in such datasets is not generally feasible using current methods that employ probabilistic models. We propose an efficient method, GADEM, which combines spaced dyads and an expectation-maximization (EM) algorithm. Candidate words (four to six nucleotides) for constructing spaced dyads are prioritized by their degree of overrepresentation in the input sequence data. Spaced dyads are converted into starting position weight matrices (PWMs). GADEM then employs a genetic algorithm (GA), with an embedded EM algorithm to improve starting PWMs, to guide the evolution of a population of spaced dyads toward one whose entropy scores are more statistically significant. Spaced dyads whose entropy scores reach a pre-specified significance threshold are declared motifs. GADEM performed comparably with MEME on 500 sets of simulated “ChIP” sequences with embedded known P53 binding sites. The major advantage of GADEM is its computational efficiency on large ChIP datasets compared to competitors. We applied GADEM to six genome-wide ChIP datasets. Approximately, 15 to 30 motifs of various lengths were identified in each dataset. Remarkably, without any prior motif information, the expected known motif (e.g., P53 in P53 data) was identified every time. GADEM discovered motifs of various lengths (6–40 bp) and characteristics in these datasets containing from 0.5 to >13 million nucleotides with run times of 5 to 96 h. GADEM can be viewed as an extension of the well-known MEME algorithm and is an efficient tool for de novo motif discovery in large-scale genome-wide data. The GADEM software is available at www.niehs.nih.gov/research/resources/software/GADEM/.

机译：蛋白质结合位点的全基因组分析产生大量数据。一个ChIP数据集可能包含10,000个站点。使用当前采用概率模型的方法，在此类数据集中进行无偏基序发现通常是不可行的。我们提出了一种有效的方法GADEM，该方法结合了间隔双色和期望最大化（EM）算法。根据输入序列数据中过分代表的程度，优先排列用于构建间隔二元组的候选词（4至6个核苷酸）。隔开的二元组被转换为起始位置权重矩阵（PWM）。然后，GADEM采用遗传算法（GA）和嵌入式EM算法来改善启动PWM，以指导一群间隔成对的二元组向其熵值在统计上更为显着的方向发展。熵分数达到预先指定的显着性阈值的隔开的二元组被称为主题。 GADEM与MEME相比，对500套具有嵌入式已知P53结合位点的模拟“ ChIP”序列进行了比较。与竞争对手相比，GADEM的主要优势在于其在大型ChIP数据集上的计算效率。我们将GADEM应用于六个全基因组ChIP数据集。在每个数据集中，大约可以识别15至30个各种长度的图案。明显地，在没有任何先前的基序信息的情况下，每次都识别出预期的已知基序（例如，P53数据中的P53）。 GADEM在这些数据集中发现了各种长度（6–40 bp）的基序和特征，包含0.5至> 1,300万个核苷酸，运行时间为5至96 h。 GADEM可以看作是众所周知的MEME算法的扩展，是在大规模全基因组数据中从头发现基序的有效工具。可在www.niehs.nih.gov/research/resources/software/GADEM/上找到GADEM软件。

著录项

来源
《Journal of Computational Biology》 |2009年第2期|317-329|共13页
作者
Leping Li;
展开▼
作者单位

Biostatistics Branch, National Institute of Environmental Health Sciences, NIH, Research Triangle Park, North Carolina.;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. GADEM: A Genetic Algorithm Guided Formation of Spaced Dyads Coupled with an EM Algorithm for Motif Discovery [J] . Li LP Journal of computational biology: A journal of computational molecular cell biology . 2009,第2期

机译：GADEM：遗传算法指导间隔的二元组的形成与EM算法相结合的主题发现
2. Genetic Algorithm-Guided Discovery of Additive Combinations That Direct Quantum Dot Assembly [J] . Lukmaan A. Bawazer, Johannes Ihli, Timothy P. Comyn, Advanced Materials . 2015,第2期

机译：遗传算法指导的直接量子点组装的加法组合的发现
3. Chemical Process Model Parameter Estimation Using an Information Guided Genetic Algorithm [J] . Chen-Wei YEH, Shi-Shang JANG Journal of Chemical Engineering of Japan . 2006,第2期

机译：信息引导遗传算法的化学过程模型参数估计
4. Genetic algorithm for dimer-led and error-restricted spaced motif discovery [C] . Chant Tak-Ming, Lo Leung-Yau, Wong Man-Leung, IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology . 2013

机译：用于二聚体引导和错误限制的间隔基序发现的遗传算法
5. Motif Discovery Algorithms Incorporating Nucleosome Positioning Information [D] . Sayad-Rahim, Azin 2009

机译：结合核小体定位信息的基元发现算法
6. gadem: A Genetic Algorithm Guided Formation of Spaced Dyads Coupled with an EM Algorithm for Motif Discovery [O] . Leping Li -1

机译：gadem：遗传算法指导间隔的二元组的形成与EM算法相结合的主题发现
7. gadem: A Genetic Algorithm Guided Formation of Spaced Dyads Coupled with an EM Algorithm for Motif Discovery [O] . Li, Leping 2009

机译：gadem：遗传算法指导间隔的二元组的形成与EM算法相结合的主题发现

GADEM: A Genetic Algorithm Guided Formation of Spaced Dyads Coupled with an EM Algorithm for Motif Discovery

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅