首页> 外文会议>Bioinformatics Research and Applications; Lecture Notes in Bioinformatics; 4463 >A Novel Greedy Algorithm for the Minimum Common String Partition Problem
【24h】

A Novel Greedy Algorithm for the Minimum Common String Partition Problem

机译:最小公共字符串分配问题的新型贪婪算法

获取原文
获取原文并翻译 | 示例

摘要

The Minimum Common String Partition problem (MCSP) is to partition two given input strings into the same collection of substrings, where the number of substrings in the partition is minimized. This problem is a key problem in genome rearrangement, and is closely related to the problem of sorting by reversals with duplicates. MCSP is NP-hard, even for the most trivial case, 2-MCSP, where each letter occurs at most twice in each input string. There are various approximation algorithms which can achieve very good approximation ratios but with complicated implementations, for example, 1.5-approximation algorithm for 2-MCSP, 1.1037-approximation algorithm for 2-MCSP and a 4-approximation algorithm for 3-MCSP. There is also a simple greedy algorithm for MCSP which extracts the longest common substring from the given strings at each step. In this paper, we propose a novel greedy algorithm for MCSP, where we extract the longest common substring containing a symbol occurring only once at each step whenever there is a such symbol. We show our algorithm is more "worst case" greedy at each step than the greedy algorithm and the expected performance of our algorithm is better than that of the greedy algorithm. Our experiments show that our method achieves a better partition on average than the greedy algorithm does. Another advantage of our algorithm is that it is much faster than the greedy algorithm.
机译:最小公用字符串分区问题(MCSP)是将两个给定的输入字符串划分为相同的子字符串集合,其中将分区中的子字符串数量最小化。这个问题是基因组重排的关键问题,并且与通过重复重复进行分选的问题密切相关。 MCSP是NP硬式的,即使在最普通的情况下,也是如此2-MCSP,其中每个字母在每个输入字符串中最多出现两次。有多种近似算法可以实现非常好的近似比率,但是实现复杂,例如,针对2-MCSP的1.5近似算法,针对2-MCSP的1.1037近似算法和针对3-MCSP的4-近似算法。对于MCSP,还有一个简单的贪婪算法,该算法在每个步骤中从给定的字符串中提取最长的公共子字符串。在本文中,我们提出了一种新颖的针对MCSP的贪婪算法,该算法提取了最长的公共子字符串,该子字符串包含一个只要有一个符号就在每个步骤仅出现一次的符号。我们显示,与贪婪算法相比,我们的算法在每个步骤上的贪婪程度都更高,并且该算法的预期性能要优于贪婪算法。我们的实验表明,与贪婪算法相比,我们的方法平均可以获得更好的分区。我们算法的另一个优点是它比贪婪算法要快得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号