首页> 外文期刊>Computational Biology and Bioinformatics, IEEE/ACM Transactions on >An Efficient Exact Algorithm for the Motif Stem Search Problem over Large Alphabets
【24h】

An Efficient Exact Algorithm for the Motif Stem Search Problem over Large Alphabets

机译:大型字母词干搜索问题的高效精确算法

获取原文
获取原文并翻译 | 示例

摘要

In recent years, there has been an increasing interest in planted (, ) motif search (PMS) with applications to discovering significant segments in biological sequences. However, there has been little discussion about PMS over large alphabets. This paper focuses on motif stem search (MSS), which is recently introduced to search motifs on large-alphabet inputs. A motif stem is an -length string with some wildcards. The goal of the MSS problem is to find a set of stems that represents a superset of all ( , ) motifs present in the input sequences, and the superset is expected to be as small as possible. The three main contributions of this paper are as follows: (1) We build motif stem representation more precisely by using regular expressions. (2) We give a method for generating all possible motif stems without redundant wildcards. (3) We propose an efficient exact algorithm, called StemFinder, for solving the MSS problem. Compared with the previous MSS algorithms, StemFinder runs much faster and reports fewer stems which represent a smaller superset of all (, ) motifs. StemFinder is freely available at http://sites.google.com/site/feqond/stemfinder.
机译:近年来,人们对种植(,)主题搜索(PMS)及其在生物序列中发现重要片段的应用越来越感兴趣。但是,关于大字母PMS的讨论很少。本文着重于主题词干搜索(MSS),最近已将其引入来搜索大字母输入上的主题。主题词干是带有一些通配符的长字符串。 MSS问题的目的是找到一组代表所有存在于输入序列中的(,)主题的超集的词干,并且该超集应尽可能小。本文的三个主要贡献如下:(1)我们使用正则表达式更精确地构建主题词干表示。 (2)我们给出了一种在没有多余通配符的情况下生成所有可能的主题词干的方法。 (3)我们提出了一种有效的精确算法,称为StemFinder,用于解决MSS问题。与以前的MSS算法相比,StemFinder的运行速度更快,并且报告的词干更少,代表所有(,)主题的较小超集。您可以从http://sites.google.com/site/feqond/stemfinder免费获得StemFinder。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号