首页> 美国卫生研究院文献>Journal of Computational Biology >HapCompass: A Fast Cycle Basis Algorithm for Accurate Haplotype Assembly of Sequence Data
【2h】

HapCompass: A Fast Cycle Basis Algorithm for Accurate Haplotype Assembly of Sequence Data

机译:HapCompass:序列数据的准确单倍型组装的快速循环基础算法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Genome assembly methods produce haplotype phase ambiguous assemblies due to limitations in current sequencing technologies. Determining the haplotype phase of an individual is computationally challenging and experimentally expensive. However, haplotype phase information is crucial in many bioinformatics workflows such as genetic association studies and genomic imputation. Current computational methods of determining haplotype phase from sequence data—known as haplotype assembly—have difficulties producing accurate results for large (1000 genomes-type) data or operate on restricted optimizations that are unrealistic considering modern high-throughput sequencing technologies. We present a novel algorithm, HapCompass, for haplotype assembly of densely sequenced human genome data. The HapCompass algorithm operates on a graph where single nucleotide polymorphisms (SNPs) are nodes and edges are defined by sequence reads and viewed as supporting evidence of co-occurring SNP alleles in a haplotype. In our graph model, haplotype phasings correspond to spanning trees. We define the minimum weighted edge removal optimization on this graph and develop an algorithm based on cycle basis local optimizations for resolving conflicting evidence. We then estimate the amount of sequencing required to produce a complete haplotype assembly of a chromosome. Using these estimates together with metrics borrowed from genome assembly and haplotype phasing, we compare the accuracy of HapCompass, the Genome Analysis ToolKit, and HapCut for 1000 Genomes Project and simulated data. We show that HapCompass performs significantly better for a variety of data and metrics. HapCompass is freely available for download ().
机译:>由于当前测序技术的局限性,基因组组装方法产生单倍型阶段歧义组装。确定个体的单倍型阶段在计算上具有挑战性,并且在实验上很昂贵。但是,单倍型阶段信息在许多生物信息学工作流程中至关重要,例如遗传关联研究和基因组估算。当前从序列数据确定单倍型相位的计算方法(称为单倍型装配)难以为大型(1000个基因组类型)数据产生准确的结果,或者难以进行受限的优化,而这对于现代高通量测序技术而言是不现实的。我们提出了一种新颖的算法,HapCompass,用于密集测序的人类基因组数据的单倍型装配。 HapCompass算法在一个图上进行操作,其中单核苷酸多态性(SNP)是节点,边缘通过序列读取定义,并被视为单倍型中共现SNP等位基因的支持证据。在我们的图模型中,单倍型阶段对应于生成树。我们在该图上定义最小加权边缘去除优化,并开发基于循环局部优化的算法来解决冲突证据。然后,我们估计产生染色体的完整单倍型装配所需的测序量。使用这些估计值以及从基因组组装和单元型定相中借用的指标,我们比较了1000个基因组计划和模拟数据的HapCompass,Genome Analysis ToolKit和HapCut的准确性。我们证明了HapCompass在各种数据和指标方面的表现要好得多。 HapCompass可免费下载()

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号