首页> 外文期刊>Journal of combinatorial optimization >An efficient polynomial space and polynomial delay algorithm for enumeration of maximal motifs in a sequence
【24h】

An efficient polynomial space and polynomial delay algorithm for enumeration of maximal motifs in a sequence

机译:一种有效的多项式空间和多项式延迟算法,用于枚举序列中的最大图案

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we consider the problem of enumerating all maximal motifs in an input string for the class of repeated motifs with wild cards. A maximal motif is such a representative motif that is not properly contained in any larger motifs with the same location lists. Although the enumeration problem for maximal motifs with wild cards has been studied in Parida et al. (2001), Pisanti et al. (2003) and Pelfrene et al. (2003), its output-polynomial time computability has been still open. The main result of this paper is a polynomial space polynomial delay algorithm for the maximal motif enumeration problem for the repeated motifs with wild cards. This algorithm enumerates all maximal motifs in an input string of length n in O(n(3)) time per motif with O(n) space, in particular O(n(3)) delay. The key of the algorithm is depth-first search on a tree-shaped search route over all maximal motifs based on a technique called prefix-preserving closure extension. We also show an exponential lower bound and a succinctness result on the number of maximal motifs, which indicate the limit of a straightforward approach. The results of the computational experiments show that our algorithm can be applicable to huge string data such as genome data in practice, and does not take large additional computational cost compared to usual frequent motif mining algorithms.
机译:在本文中,我们考虑了使用通配符对重复主题类别枚举输入字符串中所有最大主题的问题。最大主题是这样的代表性主题,它没有正确包含在具有相同位置列表的任何较大主题中。尽管在Parida等人中已经研究了带有通配符的最大图案的枚举问题。 (2001),Pisanti等。 (2003)和Pelfrene等。 (2003年),其输出多项式时间可计算性仍然是开放的。本文的主要结果是针对具有通配符的重复图案的最大图案枚举问题的多项式空间多项式延迟算法。此算法枚举每个主题的O(n(3))时间(特别是O(n(3))延迟)在长度为n的输入字符串中的所有最大主题,每个主题的O(n(3))时间。该算法的关键是基于一种称为前缀保留闭包扩展的技术,在所有最大主题的树形搜索路径上进行深度优先搜索。我们还显示了最大图案数的指数下界和简洁结果,这表明了简单方法的局限性。计算实验的结果表明,我们的算法在实践中可适用于巨大的字符串数据,如基因组数据,与常用的频繁基序挖掘算法相比,不需要花费额外的计算成本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号