首页> 外文期刊>Journal of computer and system sciences >Efficient algorithms for locating the length-constrained heaviest segments with applications to biomolecular sequence analysis
【24h】

Efficient algorithms for locating the length-constrained heaviest segments with applications to biomolecular sequence analysis

机译:定位受长度限制的最重链段的高效算法及其在生物分子序列分析中的应用

获取原文
获取原文并翻译 | 示例
           

摘要

We study two fundamental problems concerning the search for interesting regions in sequences: (ⅰ) given a sequence of real numbers of length n and an upper bound U, find a consecutive subsequence of length at most U with the maximum sum and (ⅱ) given a sequence of real numbers of length n and a lower bound L, find a consecutive subsequence of length at least L with the maximum average. We present an O(n)-time algorithm for the first problem and an O(n log L)-time algorithm for the second. The algorithms have potential applications in several areas of biomolecular sequence analysis including locating GC-rich regions in a genomic DNA sequence, post-processing sequence alignments, annotating multiple sequence alignments, and computing length-constrained ungapped local alignment. Our preliminary tests on both simulated and real data demonstrate that the algorithms are very efficient and able to locate useful (such as GC-rich) regions.
机译:我们研究了有关在序列中搜索感兴趣区域的两个基本问题:(ⅰ)给定了一个长度为n的实数序列和一个上限U,找到了一个长度最大为U的连续子序列,且最大和为(ⅱ)长度为n和下限为L的实数序列,找到长度至少为L且具有最大平均值的连续子序列。我们为第一个问题提出O(n)时间算法,为第二个问题提出O(n log L)时间算法。该算法在生物分子序列分析的多个领域中具有潜在的应用,包括在基因组DNA序列中定位富含GC的区域,后处理序列比对,注释多个序列比对以及计算长度受限的无缺口局部比对。我们对模拟和真实数据的初步测试表明,该算法非常有效,并且能够找到有用的(例如富含GC的)区域。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号