Refining motifs by improving information content scores using neighborhood profile search

Chandan K Reddy; Yao-Chung Weng; Hsiao-Dong Chiang

首页> 外文期刊>Algorithms for Molecular Biology >Refining motifs by improving information content scores using neighborhood profile search

【24h】

Refining motifs by improving information content scores using neighborhood profile search

机译：通过使用邻域配置文件搜索提高信息内容得分来完善主题

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The main goal of the motif finding problem is to detect novel, over-represented unknown signals in a set of sequences (e.g. transcription factor binding sites in a genome). The most widely used algorithms for finding motifs obtain a generative probabilistic representation of these over-represented signals and try to discover profiles that maximize the information content score. Although these profiles form a very powerful representation of the signals, the major difficulty arises from the fact that the best motif corresponds to the global maximum of a non-convex continuous function. Popular algorithms like Expectation Maximization (EM) and Gibbs sampling tend to be very sensitive to the initial guesses and are known to converge to the nearest local maximum very quickly. In order to improve the quality of the results, EM is used with multiple random starts or any other powerful stochastic global methods that might yield promising initial guesses (like projection algorithms). Global methods do not necessarily give initial guesses in the convergence region of the best local maximum but rather suggest that a promising solution is in the neighborhood region. In this paper, we introduce a novel optimization framework that searches the neighborhood regions of the initial alignment in a systematic manner to explore the multiple local optimal solutions. This effective search is achieved by transforming the original optimization problem into its corresponding dynamical system and estimating the practical stability boundary of the local maximum. Our results show that the popularly used EM algorithm often converges to sub-optimal solutions which can be significantly improved by the proposed neighborhood profile search. Based on experiments using both synthetic and real datasets, our method demonstrates significant improvements in the information content scores of the probabilistic models. The proposed method also gives the flexibility in using different local solvers and global methods depending on their suitability for some specific datasets.

机译：发现基序的问题的主要目的是检测一组序列中新的，过度表达的未知信号（例如，基因组中的转录因子结合位点）。查找主题的最广泛使用的算法获得这些过度代表信号的生成概率表示，并尝试发现可最大化信息内容得分的配置文件。尽管这些轮廓形成了信号的非常有力的表示，但是最大的困难来自于以下事实：最佳基序对应于非凸连续函数的全局最大值。诸如期望最大化（EM）和吉布斯采样之类的流行算法往往对初始猜测非常敏感，并且已知会很快收敛到最近的局部最大值。为了提高结果的质量，将EM与多个随机开始或任何其他可能产生有希望的初始猜测的强大的随机全局方法（例如投影算法）一起使用。全局方法不一定会在最佳局部最大值的收敛区域中给出初始猜测，而是建议在邻域中有希望的解决方案。在本文中，我们介绍了一种新颖的优化框架，该框架以系统的方式搜索初始比对的邻域，以探索多个局部最优解。通过将原始的优化问题转换为其相应的动力学系统并估计局部最大值的实际稳定性边界，可以实现这种有效的搜索。我们的结果表明，流行使用的EM算法通常会收敛到次优解决方案，通过提出的邻域轮廓搜索可以显着改善这种解决方案。基于使用合成数据集和真实数据集进行的实验，我们的方法证明了概率模型信息内容得分的显着提高。所提出的方法还可以灵活地使用不同的局部求解器和全局方法，这取决于它们对某些特定数据集的适用性。

著录项

来源
《Algorithms for Molecular Biology》 |2006年第1期|共页
作者
Chandan K Reddy; Yao-Chung Weng; Hsiao-Dong Chiang;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类分子生物学;
关键词

相似文献

外文文献
中文文献
专利

1. Considering scores between unrelated proteins in the search database improves profile comparison [J] . Ruslan I Sadreyev, Yong Wang, Nick V Grishin BMC Bioinformatics . 2009,第1期

机译：在搜索数据库中考虑无关蛋白质之间的分数可改善谱图比较
2. A FLEXIBLE MOTIF SEARCH TECHNIQUE BASED ON GENERALIZED PROFILES [J] . PHILIPP BUCHER, KEVIN KARPLUS, NICOLAS MOERI, Computers & Chemistry . 1996,第1期

机译：基于广义轮廓的柔性MOTIF搜索技术
3. A content-collaborative recommender that exploits WordNet-based user profiles for neighborhood formation [J] . Marco Degemmis, Pasquale Lops, Giovanni Semeraro User modeling and user-adapted interaction . 2007,第3期

机译：内容协作推荐者，利用基于WordNet的用户个人资料进行社区形成
4. A Multiobjective Variable Neighborhood Search for Solving the Motif Discovery Problem [C] . David L. González-Alvarez, Miguel A. Vega-Rodríguez, Juan A. Gómez-Pulido, Soft computing models in industrial and environmental applications, 5th international workshop (SOCO 2010) . 2010

机译：解决主题发现问题的多目标变量邻域搜索
5. Analysis of the MMPI Wiggins Content Scale Scores and MMPI-2 Content Scale Scores of Brain Injured Offenders/Brain Injured Non-offenders [D] . Felix-Wilson, Jacquelyn Deanne. 2021

机译：分析MMPI Wiggins内容量表评分和MMPI-2内容量表评分脑受伤罪犯/脑受伤非罪犯
6. Refining motifs by improving information content scores using neighborhood profile search [O] . Chandan K Reddy, Yao-Chung Weng, Hsiao-Dong Chiang 2006

机译：通过使用邻域配置文件搜索提高信息内容得分来完善主题
7. Refining motifs by improving information content scores using neighborhood profile search [O] . Chiang Hsiao-Dong, Weng Yao-Chung, Reddy Chandan K 2006

机译：通过使用邻域配置文件搜索提高信息内容得分来完善主题

Refining motifs by improving information content scores using neighborhood profile search

摘要

著录项

相似文献

相关主题

期刊订阅