首页> 外文期刊>Algorithms for Molecular Biology >Refining motifs by improving information content scores using neighborhood profile search
【24h】

Refining motifs by improving information content scores using neighborhood profile search

机译:通过使用邻域配置文件搜索提高信息内容得分来完善主题

获取原文
       

摘要

The main goal of the motif finding problem is to detect novel, over-represented unknown signals in a set of sequences (e.g. transcription factor binding sites in a genome). The most widely used algorithms for finding motifs obtain a generative probabilistic representation of these over-represented signals and try to discover profiles that maximize the information content score. Although these profiles form a very powerful representation of the signals, the major difficulty arises from the fact that the best motif corresponds to the global maximum of a non-convex continuous function. Popular algorithms like Expectation Maximization (EM) and Gibbs sampling tend to be very sensitive to the initial guesses and are known to converge to the nearest local maximum very quickly. In order to improve the quality of the results, EM is used with multiple random starts or any other powerful stochastic global methods that might yield promising initial guesses (like projection algorithms). Global methods do not necessarily give initial guesses in the convergence region of the best local maximum but rather suggest that a promising solution is in the neighborhood region. In this paper, we introduce a novel optimization framework that searches the neighborhood regions of the initial alignment in a systematic manner to explore the multiple local optimal solutions. This effective search is achieved by transforming the original optimization problem into its corresponding dynamical system and estimating the practical stability boundary of the local maximum. Our results show that the popularly used EM algorithm often converges to sub-optimal solutions which can be significantly improved by the proposed neighborhood profile search. Based on experiments using both synthetic and real datasets, our method demonstrates significant improvements in the information content scores of the probabilistic models. The proposed method also gives the flexibility in using different local solvers and global methods depending on their suitability for some specific datasets.
机译:发现基序的问题的主要目的是检测一组序列中新的,过度表达的未知信号(例如,基因组中的转录因子结合位点)。查找主题的最广泛使用的算法获得这些过度代表信号的生成概率表示,并尝试发现可最大化信息内容得分的配置文件。尽管这些轮廓形成了信号的非常有力的表示,但是最大的困难来自于以下事实:最佳基序对应于非凸连续函数的全局最大值。诸如期望最大化(EM)和吉布斯采样之类的流行算法往往对初始猜测非常敏感,并且已知会很快收敛到最近的局部最大值。为了提高结果的质量,将EM与多个随机开始或任何其他可能产生有希望的初始猜测的强大的随机全局方法(例如投影算法)一起使用。全局方法不一定会在最佳局部最大值的收敛区域中给出初始猜测,而是建议在邻域中有希望的解决方案。在本文中,我们介绍了一种新颖的优化框架,该框架以系统的方式搜索初始比对的邻域,以探索多个局部最优解。通过将原始的优化问题转换为其相应的动力学系统并估计局部最大值的实际稳定性边界,可以实现这种有效的搜索。我们的结果表明,流行使用的EM算法通常会收敛到次优解决方案,通过提出的邻域轮廓搜索可以显着改善这种解决方案。基于使用合成数据集和真实数据集进行的实验,我们的方法证明了概率模型信息内容得分的显着提高。所提出的方法还可以灵活地使用不同的局部求解器和全局方法,这取决于它们对某些特定数据集的适用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号