...
首页> 外文期刊>SIAM Journal on Computing >Locating regions in a sequence under density constraints
【24h】

Locating regions in a sequence under density constraints

机译:在密度约束下按顺序定位区域

获取原文
获取原文并翻译 | 示例
           

摘要

Several biological problems require the identification of regions in a sequence where some feature occurs within a target density range: examples including the location of GC-rich regions, identification of CpG islands, and sequence matching. Mathematically, this corresponds to searching a string of 0's and 1's for a substring whose relative proportion of 1's lies between given lower and upper bounds. We consider the algorithmic problem of locating the longest such substring, as well as other related problems (such as finding the shortest substring or a maximal set of disjoint substrings). For locating the longest such substring, we develop an algorithm that runs in O(n) time, improving upon the previous best-known O(n log n) result. For the related problems we develop O(n log log n) algorithms, again improving upon the best-known O(n log n) results. Practical testing verifies that our new algorithms enjoy significantly smaller time and memory footprints, and can process sequences that are orders of magnitude longer as a result.
机译:几个生物学问题需要识别序列中某些特征在目标密度范围内发生的区域:示例包括富含GC的区域的位置,CpG岛的识别和序列匹配。从数学上讲,这对应于搜索0和1的字符串以寻找子字符串,该子字符串的1的相对比例位于给定的上下限之间。我们考虑定位最长的此类子字符串的算法问题,以及其他相关问题(例如,找到最短的子字符串或最大的不相交子字符串集)。为了找到最长的此类子字符串,我们开发了一种在O(n)时间内运行的算法,对先前最著名的O(n log n)结果进行了改进。针对相关问题,我们开发了O(n log log n)算法,再次改进了最著名的O(n log log n)结果。实际测试证明,我们的新算法所占用的时间和内存明显更少,并且可以处理的序列长了几个数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号