A Fast Heuristic Search Algorithm for Finding the Longest Common Subsequence of Multiple Strings

机译：查找多个字符串最长公共子序列的快速启发式搜索算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Finding the longest common subsequence (LCS) of multiple strings is an NP-hard problem, with many applications in the areas of bioinformatics and computational genomics. Although significant efforts have been made to address the problem and its special cases, the increasing complexity and size of biological data require more efficient methods applicable to an arbitrary number of strings. In this paper, a novel search algorithm, MLCS-A*, is presented for the general case of multiple LCS (or MLCS) problems. MLCS-A* is a variant of the A* algorithm. It maximizes a new heuristic estimate of the LCS in each search step so that the longest common subsequence can be found. As a natural extension of MLCS-A*, a fast algorithm, MLCS-APP, is also proposed to deal with large volume of biological data for which finding a LCS within reasonable time is impossible. The benchmark test shows that MLCS-APP is able to extract common subsequences close to the optimal ones and that MLCS-APP significantly outperforms existing heuristic approaches. When applied to 8 protein domain families, MLCS-APP produced more accurate results than existing multiple sequence alignment methods.

机译：找到多个字符串的最长公共子序列（LCS）是一个NP难题，在生物信息学和计算基因组学领域有许多应用。尽管已为解决该问题及其特殊情况做出了巨大努力，但是生物数据的复杂性和规模不断增长，需要适用于任意数量字符串的更有效方法。在本文中，针对多LCS（或MLCS）问题的一般情况，提出了一种新颖的搜索算法MLCS-A *。 MLCS-A *是A *算法的一种变体。它在每个搜索步骤中最大化了对LCS的新启发式估计，从而可以找到最长的公共子序列。作为MLCS-A *的自然扩展，还提出了一种快速算法MLCS-APP，以处理无法在合理时间内找到LCS的大量生物数据。基准测试表明，MLCS-APP能够提取接近最佳子序列的公共子序列，并且MLCS-APP明显优于现有的启发式方法。当应用于8个蛋白质结构域家族时，MLCS-APP比现有的多序列比对方法产生的结果更准确。

著录项

来源
《Innovative applications of artificial intelligence conference;AAAI conference on artificial intelligence;IAAI-10;Symposium on educational advances in artificial intelligence;AAAI-10;EAAI-10》|2011年|p.1287-1292|共6页
会议地点
作者
Qingguo Wang; Mian Pan; Yi Shang; Dmitry Korkin;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. A fast and simple algorithm for computing the longest common subsequence of run-length encoded strings [J] . Hsing-Yen Ann, Chang-Biau Yang, Chiou-Ting Tseng, Information Processing Letters . 2008,第6期

机译：一种快速简单的算法，用于计算游程长度编码字符串的最长公共子序列
2. An Efficient Fast Pruned Parallel Algorithm for finding Longest Common Subsequences in BioSequences [J] . Sumathy Eswaran, S. P. RajaGopalan Annals. Computer Science Series . 2010,第1期

机译：在生物序列中找到最长共同子序列的高效快速修剪并行算法
3. Finding a longest common subsequence between a run-length-encoded string and an uncompressed string [J] . J.J. Liu, Y.L. Wang, R.C.T. Lee Journal of complexity . 2008,第2期

机译：在游程长度编码的字符串和未压缩的字符串之间找到最长的公共子序列
4. A Fast Heuristic Search Algorithm for Finding the Longest Common Subsequence of Multiple Strings [C] . Qingguo Wang, Mian Pan, Yi Shang, AAAI Conference on Artificial Intelligence . 2010

机译：一种快速启发式搜索算法，用于查找多个字符串最长的常见子序列
5. Exact and heuristic algorithms for the job shop scheduling problem with earliness and tardiness over a common due date. [D] . Bedoya-Valencia, Leonardo. 2007

机译：精确且启发式的解决方案，用于在常见到期日之前出现提前和拖延的车间调度问题。
6. A Space-Bounded Anytime Algorithm for the Multiple Longest Common Subsequence Problem [O] . Jiaoyun Yang, Yun Xu, Yi Shang, -1

机译：多重最长公共子序列问题的有界无时限算法
7. Heuristic Algorithms for the Longest Filled Common Subsequence Problem [O] . Radu Stefan Mincu, Alexandru Popa 2018

机译：启发式算法，最长填充的共同子序列问题

A Fast Heuristic Search Algorithm for Finding the Longest Common Subsequence of Multiple Strings

摘要

著录项

相似文献

相关主题

期刊订阅