首页> 美国卫生研究院文献>BMC Bioinformatics >Evaluating genome architecture of a complex region via generalized bipartite matching
【2h】

Evaluating genome architecture of a complex region via generalized bipartite matching

机译:通过广义二分匹配评估复杂区域的基因组结构

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

With the remarkable development in inexpensive sequencing technologies and supporting computational tools, we have the promise of medicine being personalized by knowledge of the individual genome. Current technologies provide high throughput, but short reads. Reconstruction of the donor genome is based either on de novo assembly of the (short) reads, or on mapping donor reads to a standard reference. While such techniques demonstrate high success rates for inferring 'simple' genomic segments, they are confounded by segments with complex duplication patterns, including regions of direct medical relevance, like the HLA and the KIR regions.In this work, we address this problem with a method for assessing the quality of a predicted genome sequence for complex regions of the genome. This method combines two natural types of evidence: sequence similarity of the mapped reads to the predicted donor genome, and distribution of reads across the predicted genome. We define a new scoring function for read-to-genome matchings, which penalizes for sequence dissimilarities and deviations from expected read location distribution, and present an efficient algorithm for finding matchings that minimize the penalty. The algorithm is based on a formal problem, first defined in this paper, called >Coverage >Sensitive many-to-many min-cost bipartite >Matching (CSM). This new problem variant generalizes the standard (one-to-one) weighted bipartite matching problem, and can be solved using network flows. The resulting Java-based tool, called SAGE (>Scoring function for >Assembled >GEnomes), is freely available upon request. We demonstrate over simulated data that SAGE can be used to infer correct haplotypes of the highly repetitive KIR region on the Human chromosome 19.
机译:随着廉价测序技术和支持的计算工具的显着发展,我们有望通过对单个基因组的了解来个性化医学。当前的技术提供高吞吐量,但读取时间短。供体基因组的重建基于(短)读数的从头组装,或基于将供体读数映射到标准参考。尽管此类技术在推断``简单''基因组片段方面显示出很高的成功率,但它们却被复杂复制模式的片段所混淆,包括具有直接医学相关性的区域,例如HLA和KIR区域。评估基因组复杂区域的预测基因组序列质量的方法。该方法结合了两种自然类型的证据:定位读段与预测供体基因组的序列相似性,以及读段在预测基因组中的分布。我们为基因组读取定义了新的评分功能,该功能对序列差异和与预期的读取位置分布的偏离进行了惩罚,并提出了一种有效的算法来寻找最小化代价的匹配。该算法基于本文首先定义的形式问题,称为> C 超额> S 敏感的多对多最小成本二分法> M 匹配(CSM)。这个新的问题变体概括了标准(一对一)加权二分匹配问题,可以使用网络流来解决。生成的基于Java的工具称为SAGE(> A 组装的> GE 标称的> S 取芯函数),可根据要求免费提供。我们通过模拟数据证明SAGE可用于推断人类19号染色体上高度重复的KIR区的正确单倍型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号