【24h】

On the Shortest Common Superstring of NGS Reads

机译:关于NGS读取的最短常见体验

获取原文

摘要

The Shortest Superstring Problem (SSP) consists, for a set of strings S = {s_1, ? ? ? , s_n} (with no s_i substring of s_j), to find a minimum length string that contains all s_i, 1 ≤ i ≤ n, as substrings. This problem is proved to be NP-Complete and APX-hard. Guaranteed approximation algorithms have been proposed, the current best ratio being 2 (11)/(30), which has been achieved through a long and difficult process. SSP is highly used in practice on Next Generation Sequencing (NGS) data, which plays an increasingly important role in modern biological and medical research. In this note, we show that on NGS data the SSP approximation ratio reached by the classical algorithm of Blum et al. [2], is usually below 2 (11)/(30), while assuming specific characteristics of the data that are experimentally verified on a large sampling set. Moreover, we present an efficient linear time test for any input of strings of equal length, which allows to compute the approximation ratio that can be reached using the classical algorithm in [2].
机译:最短超弦问题(SSP)包括,对于一组的串S = {S_1,?还是还是,S_N}(具有s_j的无S_I子串),以查找包含所有S_I最小长度字符串,1≤I≤N,作为子字符串。这个问题被证明是NP完全和APX-硬。保证近似算法已被提出,当前的最佳比率为2(11)/(30),它已经通过一个长期和困难的过程来实现的。 SSP是在新一代测序(NGS)的数据,这在现代生物学和医学研究中的作用越来越重要实践高度使用。在这份说明中,我们显示,NGS数据SSP近似比百隆等人的经典算法达到。 [2],通常低于2(11)/(30),而假定是在一个大的采样集实验验证的数据的具体特性。此外,我们提出了一种有效的线性时间测试为相等的长度,这允许计算可在[2],用经典的算法可达到的近似比的字符串的任何输入。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号