首页> 外文会议>International Conference on Genome Informatics >LOCALIZED SUFFIX ARRAY AND ITS APPLICATIONTO GENOME MAPPING PROBLEMS FOR PAIRED-ENDSHORT READS
【24h】

LOCALIZED SUFFIX ARRAY AND ITS APPLICATIONTO GENOME MAPPING PROBLEMS FOR PAIRED-ENDSHORT READS

机译:本地化后缀数组及其在基因组映射问题中的配对结束短读取的应用

获取原文

摘要

We introduce a new data structure, a localized suffix array, based on which occurrenceinformation is dynamically represented as the combination of global positional infor-mation and local lexicographic order information in text search applications. For thesearch of a pair of words within a given distance, many candidate positions that sharea coarse-grained global position can be compactly represented in term of local lexico-graphic orders as in the conventional suffix array, and they can be simultaneously exam-ined for violation of the distance constraint at the coarse-grained resolution. Trade-offbetween the positional and lexicographical information is progressively shifted towardsfiner positional resolution, and the distance constraint is reexamined accordingly. Thusthe paired search can be efficiently performed even if there are a large number of occur-rences for each word. The localized suffix array itself is in fact a reordering of bits insidethe conventional suffix array, and their memory requirements are essentially the same.We demonstrate an application to genome mapping problems for paired-end short readsgenerated by new-generation DNA sequencers. When paired reads are highly repetitive,it is time-consuming to naively calculate, sort, and compare all of the coordinates. For ahuman genome re-sequencing data of 36 base pairs, more than 10 times speedups over thenaive method were observed in almost half of the cases where the sums of redundancies(number of individual occurrences) of paired reads were greater than 2,000.
机译:我们引入了一种新的数据结构,一个本地化后缀数组,基于哪个发生的事件表现为文本搜索应用程序中的全局位置信息和本地词典订单信息的组合。对于在给定距离内的一对单词的研究中,许多候选位置可以在常规后缀阵列中的本地词典 - 图形订单中的术语中可以紧凑地表示,它们可以同时进行检查违反粗粒度分辨率的距离约束。贸易偏移位置和词典信息逐步转移以换档位置分辨率,距离约束是相应的审查。因此,即使每个单词存在大量发生rence,也可以有效地执行配对搜索。本地化后缀阵列本身实际上是对常规后缀阵列的重新排序,它们的存储器要求基本相同。我们证明了通过新一代DNA测序仪进行配对结束短读取的基因组映射问题的应用。当配对读取高度重复时,久地地计算,排序和比较所有坐标是耗时的。对于36个碱基对的Ahuman基因组重新排序数据,在几乎一半的情况下观察到超过10倍的方法超过10倍的方法,其中配对读数的冗余总和(个人出现的数量)大于2,000。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号