【24h】

Faster Computation of Genome Mappability with one Mismatch

机译:用一个不匹配更快地计算基因组可用性

获取原文
获取外文期刊封面目录资料

摘要

Summary form only given. The genome mappability problem refers to cataloging repetitive occurrences of every substring of length m in a genome, and its k-mappability variant extends this to approximate repeats by allowing up to k mismatches. This problem is formulated as follows: Given a sequence S[1, n] of length n over the constant DNA alphabet Σ = {A, C, G, T}, and two integers k and m ≤ n, output an integer array Fk, such that: Fk[i] = |{j ≠ i|dH(S[i, i + m - 1], S[j, j + m - 1]) ≤ k}| where dH(·,·) represents the hamming distance. Derrien et al. [PLoS one 2012] represented this problem within the framework of genome analysis. In this work we present a provably efficient algorithm for 1-mappability with O(n log n) worst case run time and O(n) spece. The fundamental technique is the heavy path decomposition on the suffix tree (ST) of S, and the entire work is based on the framework by Thankachan et al. [RECOMB 2018]. The previous best known run time is O(n log n log log n) [Alzamel et al., COCOA 2017].
机译:摘要表格仅给出。基因组涂布性问题是指在基因组中的每一个长度M的亚流量的重复发生,并且其K-易用性变型通过允许高达k不匹配来延伸至近似重复。该问题的制定如下:给定长度N的序列S [1,n],在恒定的DNA字母σ= {a,c,g,t}和两个整数k和m≤n上,输出整数阵列f k ,这样的:f k [i] = | {J≠I| D. h (S [I,I + M-1],S [J,J + M-1])≤K} |其中d h (·,·)代表汉明距离。 Derrien等人。 [Plos 2012]在基因组分析框架内代表了这个问题。在这项工作中,我们呈现了一种可释放的有效算法,可提供与O(n log n)最坏情况运行时间和O(n)规格的1-oppappity算法。基本技术是S的后缀树(ST)的沉重路径分解,整个工作基于ChranthAn等人的框架。 [Recomb 2018]。以前最着名的运行时间是O(n log n log log n)[Alzamel等,Cocoa 2017]。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号