Fast Exact Algorithms for the Closest String and Substring Problems with Application to the Planted (L,d)-Motif Model

Chen Zhi-Zhong; Wang Lusheng

首页> 外文期刊>Computational Biology and Bioinformatics, IEEE/ACM Transactions on >Fast Exact Algorithms for the Closest String and Substring Problems with Application to the Planted (L,d)-Motif Model

【24h】

Fast Exact Algorithms for the Closest String and Substring Problems with Application to the Planted (L,d)-Motif Model

机译：最接近的字符串和子字符串问题的快速精确算法及其在种植（L，d）-Motif模型中的应用

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We present two parameterized algorithms for the closest string problem. The first runs in O(nL + ndcdot 17.97^d) time for DNA strings and in O(nL + ndcdot 61.86^d) time for protein strings, where n is the number of input strings, L is the length of each input string, and d is the given upper bound on the number of mismatches between the center string and each input string. The second runs in O(nL + ndcdot 13.92^d) time for DNA strings and in O(nL + ndcdot 47.21^d) time for protein strings. We then extend the first algorithm to a new parameterized algorithm for the closest substring problem that runs in O((n-1)m^2(L + dcdot 17.97^dcdot m^{lfloor log_2(d+1)rfloor })) time for DNA strings and in O((n-1)m^2(L + dcdot 61.86^dcdot m^{lfloor log_2(d+1)rfloor })) time for protein strings, where n is the number of input strings, L is the length of the center substring, L - 1 + m is the maximum length of a single input string, and d is the given upper bound on the number of mismatches between the center substring and at least one substring of each input string. All the algorithms significantly improve the previous bests. To verify experimentally the theoretical improvements in the time complexity, we implement our algorithm in C and apply the resulting program to the planted (L, d)-motif problem proposed by Pevzner and Sze in 2000. We compare our program with the previously best exact program for the problem, namely PMSPrune (designed by Davila et al. in 2007). Our experimental data show that our program runs faster for practical cases and also for several challenging cases. Our algorithm uses less memory too.

机译：我们为最接近的字符串问题提供了两种参数化算法。对于DNA字符串，第一次运行时间为O（nL + ndcdot 17.97 ^ d），对于蛋白质字符串，第一次运行时间为O（nL + ndcdot 61.86 ^ d），其中n是输入字符串的数量，L是每个输入字符串的长度，d是中心字符串和每个输入字符串之间不匹配数的给定上限。对于DNA串，第二个运行时间为O（nL + ndcdot 13.92 ^ d），对于蛋白质串，第二个运行时间为O（nL + ndcdot 47.21 ^ d）。然后，我们将第一个算法扩展到针对在O（（n-1）m ^ 2（L + dcdot 17.97 ^ dcdot m ^ {lfloor log_2（d + 1）rfloor}））中运行的最接近子字符串问题的新参数化算法DNA字符串的时间，蛋白质字符串的时间为O（（n-1）m ^ 2（L + dcdot 61.86 ^ dcdot m ^ {lfloor log_2（d + 1）rfloor}））的时间，其中n是输入字符串的数量，L是中心子字符串的长度，L-1 + m是单个输入字符串的最大长度，d是中心子字符串与每个输入字符串的至少一个子字符串之间不匹配数的给定上限。所有算法都大大提高了以前的最佳性能。为了通过实验验证时间复杂度的理论改进，我们在C中实现了我们的算法，并将所得程序应用于Pevzner和Sze在2000年提出的种植（L，d）-基序问题。我们将程序与以前的最佳精确度进行了比较解决该问题的程序，即PMSPrune（由Davila等人于2007年设计）。我们的实验数据表明，我们的程序在实际情况下以及在一些具有挑战性的情况下运行速度都更快。我们的算法也使用更少的内存。

著录项

来源
《Computational Biology and Bioinformatics, IEEE/ACM Transactions on》 |2011年第5期|p.1400-1410|共11页
作者
Chen Zhi-Zhong; Wang Lusheng;
展开▼
作者单位

Tokyo Denki University, Hatomaya, Saitama;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
DNA motif discovery.; Parameterized algorithm; closest string; closest substring;

机译：DNA基序发现;参数化算法;最接近的字符串;最接近的子字符串;

相似文献

外文文献
中文文献
专利

1. MORE EFFICIENT ALGORITHMS FOR CLOSEST STRING AND SUBSTRING PROBLEMS [J] . Ma B, Sun XM SIAM Journal on Computing . 2010,第4期

机译：更有效的算法，用于解决最近的字符串和子字符串问题
2. Fast Algorithms for the Shortest Unique Palindromic Substring Problem on Run-Length Encoded Strings [J] . Kiichi Watanabe, Yuto Nakashima, Shunsuke Inenaga, Theory of computing systems . 2020,第7期

机译：运行长度编码字符串最短独特的回文源问题的快速算法
3. Exact algorithm and heuristic for the Closest String Problem [J] . Xiaolan Liu, Shenghan Liu, Zhifeng Hao, Computers & operations research . 2011,第11期

机译：最接近字符串问题的精确算法和启发式
4. More Efficient Algorithms for Closest String and Substring Problems [C] . Bin Ma, Xiaoming Sun Research in Computational Molecular Biology . 2008

机译：解决最接近的字符串和子字符串问题的更高效算法
5. Fast Exact Algorithms for Optimization Problems in Resource Allocation and Switched Linear Systems [D] . Wu, Zeyang. 2019

机译：资源分配中优化问题的快速准确算法和交换线性系统
6. An Efficient Rank Based Approach for Closest String and Closest Substring [O] . Liviu P. Dinu, Radu Ionescu 2009

机译：基于有效等级的最接近字符串和最接近子字符串的方法
7. Fast exact algorithms for the closest string and substring problems with application to the planted (l, d)-motif model, in [O] . Zhi-zhong Chen, Lusheng Wang 2013

机译：应用于种植（l，d）-motif模型的最接近的字符串和子字符串问题的快速精确算法
8. New Models and Fast Algorithms for Natural and Urban Clutter with Applications [R] . Cooper, D. B. 1997

机译：自然和城市杂波的新模型和快速算法及其应用

Fast Exact Algorithms for the Closest String and Substring Problems with Application to the Planted (L,d)-Motif Model

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅