首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >Approaching the One-Sided Exemplar Adjacency Number Problem
【24h】

Approaching the One-Sided Exemplar Adjacency Number Problem

机译:接近单面示例性邻接号问题

获取原文
获取原文并翻译 | 示例

摘要

The one-sided Exemplar Adjacency Number (EAN) is a known problem for computing the exemplar similarity between a generic linear genome G with gene duplications and an exemplar genome H (over the same set of n gene families). In this problem, we need to compute an exemplar genome G, which is a permutation obtained from G, such that the number of common adjacencies between G and H is maximized. Unfortunately, the problem is not only NP-hard but also NP-hard to approximate. In this paper, we approach the problem by relaxing the constraint such that a sub-permutation G(+) obtained from G does not have to include all the gene families, but still needs to have a length at least k. Hence G(+) is called a pseudo-exemplar genome. Then, a slightly more general problem (One-sided EAN+) is defined: compute a pseudo-exemplar genome G(+) from G such that the number of common adjacencies between H and G(+) is maximized. Certainly One-sided EAN+ contains One-sided EAN as a special case; moreover, it presents some flexibility in designing algorithms. First, we relax and formulate the One-sided EAN+ problem as the maximum independent set (MIS) on a colored interval graph and hence reduce the appearance of each gene to at most two times. We show that this new relaxation is still NP-complete, though a simple factor-2 approximation algorithm can be designed; moreover, we also prove that the problem cannot be approximated within 2 - epsilon by a local search technique. We then show that this relaxed version is fixed-parameter tractable (FPT). Second, to ensure that each gene appears in G(+) at most once, we use integer linear programming (ILP) to solve this problem. Finally, we implement our algorithm and compare it with the up-to-date software GREDU, with simulated signed and unsigned genomes. It turns out that our algorithm is more stable and can process genomes of length up to 12,000 for signed genomes (while GREDU can falter on such a large signed genome and it cannot handle unsigned genomes at all).
机译:单面示例性邻接号(EAN)是用于计算具有基因重复和示例基因组H(在相同的N基因家族中)之间的通用线性基因组G之间的示例性相似性的已知问题。在这个问题中,我们需要计算一个示例性基因组G,其是从G获得的置换,使得G和H之间的常见邻接的数量最大化。不幸的是,问题不仅是NP - 硬,而且很难近似。在本文中,我们通过放松约束来解决问题,使得从G获得的子置换G(+)不必包括所有基因家族,但仍然需要具有至少k的长度。因此,G(+)称为假示基因组。然后,定义稍长的一般问题(单侧eAn +):计算来自g的伪示例基因组g(+),使得H和G(+)之间的常见邻接的数量最大化。当然是单面的ean +含有单面ean作为特殊情况;此外,它在设计算法方面具有一些灵活性。首先,我们放松并制定片面的Iean +问题作为彩色区间图上的最大独立集(MIS),因此将每个基因的外观降低到最多两次。我们表明,这一新的放松仍然是NP完整的,尽管可以设计简单的因子-2近似算法;此外,我们还证明了通过本地搜索技术在2 - epsilon内近似的问题。然后我们表明,这种轻松的版本是固定参数的贸易(FPT)。其次,为了确保最多一次的每个基因出现在G(+)中,我们使用整数线性编程(ILP)来解决这个问题。最后,我们实现了我们的算法,并将其与最新软件GREDU进行比较,具有模拟签名和无符号基因组。事实证明,我们的算法更稳定,可以为签名基因组处理长度高达12,000的长度(而Gredu可以在这种大型签名基因组上发抖,并且它根本不能处理无符号的基因组)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号