首页> 外文OA文献 >Locating regions in a sequence under density constraints
【2h】

Locating regions in a sequence under density constraints

机译:在密度约束下按顺序定位区域

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Several biological problems require the identification of regions in a sequence where some feature occurs within a target density range: examples including the location of GC-rich regions, identification of CpG islands, and sequence matching. Mathematically, this corresponds to searching a string of 0u27s and 1u27s for a substring whose relative proportion of 1u27s lies between given lower and upper bounds. We consider the algorithmic problem of locating the longest such substring, as well as other related problems (such as finding the shortest substring or a maximal set of disjoint substrings). For locating the longest such substring, we develop an algorithm that runs in O(n) time, improving upon the previous best-known O(n log n) result. For the related problems we develop O(n log log n) algorithms, again improving upon the best-known O(n log n) results. Practical testing verifies that our new algorithms enjoy significantly smaller time and memory footprints, and can process sequences that are orders of magnitude longer as a result.
机译:几个生物学问题需要识别序列中某些特征在目标密度范围内发生的区域:示例包括富含GC的区域的位置,CpG岛的识别和序列匹配。从数学上讲,这相当于在字符串0 u27s和1 u27s中搜索子字符串,该子字符串的1 u27s的相对比例在给定的上下限之间。我们考虑定位最长的此类子字符串的算法问题,以及其他相关问题(例如,找到最短的子字符串或最大的不相交子字符串集)。为了找到最长的此类子字符串,我们开发了一种在O(n)时间内运行的算法,对先前最著名的O(n log n)结果进行了改进。针对相关问题,我们开发了O(n log log n)算法,再次改进了最著名的O(n log log n)结果。实际测试证明,我们的新算法占用的时间和内存明显更少,并且可以处理的序列长了几个数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号