首页> 外文会议>International symposium on mathematical foundtions of computer science >Efficient Algorithms for Locating the Length-Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis
【24h】

Efficient Algorithms for Locating the Length-Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis

机译:用于定位长度约束最重的段的高效算法,具有生物分子序列分析的应用

获取原文

摘要

We study two fundamental problems concerning the search for interesting regions in sequences: (i) given a sequence of real numbers of length n and an upper bound U, find a consecutive subsequence of length at most U with the maximum sum and (ii) given a sequence of real numbers of length n and a lower bound L, find a consecutive subsequence of length at least L with the maximum average. We present an O(n)-time algorithm for the first problem and an O(n log L)-time algorithm for the second. The algorithms have potential applications in several areas of biomolecular sequence analysis including locating GC-rich regions in a genomic DNA sequence, post-processing sequence alignments, annotating multiple sequence alignments, and computing length-constrained ungapped local alignment. Our preliminary tests on both simulated and real data demonstrate that the algorithms are very efficient and able to locate useful (such as GC-rich) regions.
机译:我们研究了关于搜索序列有趣区域的两个基本问题:(i)给定一系列实数长度n和上限U,找到大多数U的连续子率,最大和(ii)给出长度n和下限L的一系列实数为n和下限L,在最大平均值中找到长度的长度的连续子值。我们为第一问题提供了一个O(n)-time算法和第二个问题的O(n log l)-time算法。该算法具有在若干生物分子序列分析区域中的潜在应用,包括在基因组DNA序列中定位富含GC的区域,后处理序列对准,注释多个序列比对,以及计算长度约束的未被局部对准。我们对模拟和实际数据的初步测试表明算法非常有效,能够定位有用(如GC-Rich)区域。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号