【24h】

Generalized Planted (l,d)-Motif Problem with Negative Set

机译:具有负集的广义种植(l,d)-母题

获取原文
获取原文并翻译 | 示例

摘要

Finding similar patterns (motifs) in a set of sequences is an important problem in Computational Molecular Biology. Pevzner and Sze defined the planted (l,d)-motif problem as trying to find a length-l pattern that occurs in each input sequence with at most d substitutions. When d is large, this problem is difficult to solve because the input sequences do not contain enough information on the motif. In this paper, we propose a generalized planted (l,d)-motif problem which considers as input an additional set of sequences without any substring similar to the motif (negative set) as extra information. We analyze the effects of this negative set on the finding of motifs, and define a set of unsolvable problems and another set of most difficult problems, known as "challenging generalized problems". We develop an algorithm called VANS based on voting and other novel techniques, which can solve the (9,3), (11,4),(15,6) and (20,8)-motif problems which were unsolvable before as well as challenging problems of the planted (l,d)-motif problem such as (9,2), (11,3), (15,5) and (20,7)-motif problems.
机译:在一组序列中寻找相似的模式(基序)是计算分子生物学中的一个重要问题。 Pevzner和Sze将植入的(l,d)-基序问题定义为试图查找长度为l的模式,该模式在每个输入序列中最多出现d个替换。当d大时,由于输入序列在基序上没有包含足够的信息,因此很难解决此问题。在本文中,我们提出了一个广义的植入式(l,d)-基元问题,该问题将输入的一组额外序列(没有类似于基序(负集)的子字符串)作为输入作为额外信息。我们分析了此负面集对找到主题的影响,并定义了一组无法解决的问题和另一组最困难的问题,称为“具有挑战性的广义问题”。我们基于投票和其他新颖技术开发了一种称为VANS的算法,该算法可以解决以前也无法解决的(9,3),(11,4),(15,6)和(20,8)主题问题像(9,2),(11,3),(15,5)和(20,7)-基元问题一样是(l,d)-基元问题的具有挑战性的问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号