【24h】

Comparing Sequences with Segment Rearrangements

机译:将序列与段重排进行比较

获取原文
获取原文并翻译 | 示例

摘要

Computational genomics involves comparing sequences based on "similarity" for detecting evolutionary and functional relationships. Until very recently, available portions of the human genome sequence (and that of other species) were fairly short and sparse. Most sequencing effort was focused on genes and other short units; similarity between such sequences was measured based on character level differences. However with the advent of whole genome sequencing technology there is emerging consensus that the measure of similarity between long genome sequences must capture the rearrangements of large segments found in abundance in the human genome. In this paper, we abstract the general problem of computing sequence similarity in the presence of segment rearrangements. This problem is closely related to computing the smallest grammar for a string or the block edit distance between two strings. Our problem, like these other problems, is NP hard. Our main result here is a simple O(1) factor approximation algorithm for this problem. In contrast, best known approximations for the related problems are factor Ω(log n) off from the optimal. Our algorithm works in linear time, and in one pass. In proving our result, we relate sequence similarity measures based on different segment rearrangements, to each other, tight up to constant factors.
机译:计算基因组学涉及比较基于“相似性”的序列,以检测进化和功能关系。直到最近,人类基因组序列(以及其他物种)的可用部分还相当短且稀疏。大多数测序工作集中在基因和其他短单元上。这些序列之间的相似性是基于字符水平差异来测量的。然而,随着全基因组测序技术的出现,人们逐渐达成共识,即长基因组序列之间的相似性度量必须捕获人类基因组中大量存在的大片段的重排。在本文中,我们抽象了在片段重排的情况下计算序列相似性的一般问题。此问题与计算字符串的最小语法或两个字符串之间的块编辑距离密切相关。像其他问题一样,我们的问题也是NP难题。我们的主要结果是针对此问题的简单O(1)因子近似算法。相反,有关问题的最著名的近似值是最优值的Ω(log n)。我们的算法在线性时间中有效,并且一次通过。为了证明我们的结果,我们将基于不同片段重排的序列相似性度量彼此联系起来,并严格遵守恒定因子。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号