首页> 美国卫生研究院文献>Algorithms for Molecular Biology : AMB >Linear time minimum segmentation enables scalable founder reconstruction
【2h】

Linear time minimum segmentation enables scalable founder reconstruction

机译:线性时间最小分割可实现可扩展的创始人重建

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Background We study a preprocessing routine relevant in pan-genomic analyses: consider a set of aligned haplotype sequences of complete human chromosomes. Due to the enormous size of such data, one would like to represent this input set with a few founder sequences that retain as well as possible the contiguities of the original sequences. Such a smaller set gives a scalable way to exploit pan-genomic information in further analyses (e.g. read alignment and variant calling). Optimizing the founder set is an NP-hard problem, but there is a segmentation formulation that can be solved in polynomial time, defined as follows. Given a threshold L and a set R={R1,,Rm} of m strings (haplotype sequences), each having length n, the minimum segmentation problem for founder reconstruction is to partition [1, n] into set P of disjoint segments such that each segment [a,b]P has length at least L and the number d(a,b)=|{Ri[a,b]:1im}| of distinct substrings at segment [a, b] is minimized over [a,b]P. The distinct substrings in the segments represent founder blocks that can be concatenated to form max{d(a,b):[a,b]P} founder sequences representing the original R such that crossovers happen only at segment boundaries.
机译:背景我们研究了与全基因组分析相关的预处理程序:考虑一组完整的人类染色体的对齐单倍型序列。由于此类数据的数量巨大,因此我们希望用一些创建者序列来表示此输入集,这些创建者序列将尽可能保留原始序列的连续性。这种较小的集合提供了一种可扩展的方式,可以在进一步的分析(例如读取比对和变异调用)中利用泛基因组信息。优化创建者集是一个NP难题,但存在可以在多项式时间内求解的细分公式,定义如下。给定阈值L和一组 R = { R 1 ... R m } ,创建者重建的最小分割问题是将[1,n]划分为集合P不相交的段,这样每个段 [ a b ] P 的长度至少为L,数字 d a ,< / mo> b = | <弹力= “ false”> { R i [ a b ] 1 i m } | / 1998 / Math / MathML“ id =” M8“溢出=” scroll“> [ a b ] P 。段中不同的子字符串表示创建者块,可以将它们组合成 < mrow> 最大 { d < mi> a b [ a b ] P } 创建者序列,代表原始的 R ,以便仅在片段边界发生交叉。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号