首页> 外文OA文献 >Fast computation of supermaximal repeats in DNA sequences
【2h】

Fast computation of supermaximal repeats in DNA sequences

机译:快速计算DNA序列中的最大重复

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Searching for repetitive structures in DNA sequences is a major problem in bioinformatics research. We propose a novel index structure, called Parent-of-Leaves (POL) index and an algorithm for finding supermaximal repeats (SMR) which uses the index. The index is derived from and designed to replace the more versatile, but considerably larger suffix tree index STTD64. The results of our experiments using 24 homo sapiens chromosomes indicate that SMR significantly outperforms the Vmatch tool, the best known software package. Using constructed POL index, SMR is 2 times faster than Vmatch in searching for supermaximal repeats of size at least 10 bases. SMR is 7 times faster for repeats of minimum length of 25 nucleotide bases, and about an order of magnitude faster for repeats of length at least 200 basis. We also studied the cost of constructing the POL index, and the number of times we need to run SMR in order for the cost to payoff. The results indicate that our proposed technique outperforms Vmatch after two runs on a particular sequence using the POL25 index which has minimum index length (MIL) of 25 nucleotides, 3 runs with POL10, 5 runs with POL100, and 10 runs with POL200. The storage requirements of various POL indexes are much less than the suffix tree index used, about 200 times smaller for POL200 and POL100, and 25 times smaller for POL25. POL10 requires the largest storage space, which is one quarter the size of the STTD64 index.
机译:在DNA序列中寻找重复结构是生物信息学研究中的主要问题。我们提出了一种新颖的索引结构,称为“父叶”(POL)索引,以及一种使用该索引查找超最大重复(SMR)的算法。该索引派生自并设计用来替代功能更广泛但后缀树索引STTD64更大的文件。我们使用24根智人染色体的实验结果表明,SMR明显优于最知名的软件包Vmatch工具。使用构造的POL索引,在搜索大小至少为10个碱基的超最大重复序列时,SMR比Vmatch快2倍。对于最小长度为25个核苷酸碱基的重复,SMR快7倍,对于长度至少200个碱基的重复,SMR快大约一个数量级。我们还研究了构建POL索引的成本,以及需要运行SMR才能获得回报的次数。结果表明,我们提出的技术在使用最小长度为25个核苷酸的POL25索引在特定序列上进行两次运行后优于Vmatch,其最小索引长度(MIL)为25个核苷酸,使用POL10运行3次,使用POL100运行5次,使用POL200运行10次。各种POL索引的存储需求远小于所使用的后缀树索引,对于POL200和POL100,存储需求小200倍,对于POL25,存储需求小25倍。 POL10需要最大的存储空间,是STTD64索引大小的四分之一。

著录项

  • 作者

    Lian Chen Na;

  • 作者单位
  • 年度 2009
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号