首页> 美国卫生研究院文献>PLoS Computational Biology >ConPADE: Genome Assembly Ploidy Estimation from Next-Generation Sequencing Data
【2h】

ConPADE: Genome Assembly Ploidy Estimation from Next-Generation Sequencing Data

机译:ConPADE:来自下一代测序数据的基因组装配倍数估计

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。
获取外文期刊封面目录资料

摘要

As a result of improvements in genome assembly algorithms and the ever decreasing costs of high-throughput sequencing technologies, new high quality draft genome sequences are published at a striking pace. With well-established methodologies, larger and more complex genomes are being tackled, including polyploid plant genomes. Given the similarity between multiple copies of a basic genome in polyploid individuals, assembly of such data usually results in collapsed contigs that represent a variable number of homoeologous genomic regions. Unfortunately, such collapse is often not ideal, as keeping contigs separate can lead both to improved assembly and also insights about how haplotypes influence phenotype. Here, we describe a first step in avoiding inappropriate collapse during assembly. In particular, we describe ConPADE (Contig Ploidy and Allele Dosage Estimation), a probabilistic method that estimates the ploidy of any given contig/scaffold based on its allele proportions. In the process, we report findings regarding errors in sequencing. The method can be used for whole genome shotgun (WGS) sequencing data. We also show applicability of the method for variant calling and allele dosage estimation. Results for simulated and real datasets are discussed and provide evidence that ConPADE performs well as long as enough sequencing coverage is available, or the true contig ploidy is low. We show that ConPADE may also be used for related applications, such as the identification of duplicated genes in fragmented assemblies, although refinements are needed.
机译:由于基因组装配算法的改进和高通量测序技术成本的不断降低,新的高质量基因组草图草案以惊人的速度发布。利用公认的方法,可以处理更大,更复杂的基因组,包括多倍体植物基因组。考虑到多倍体个体中基本基因组的多个拷贝之间的相似性,此类数据的组装通常会导致折叠的重叠群,这些重叠群代表可变数目的同源基因组区域。不幸的是,这种折叠通常不是理想的,因为使重叠群分开可以既提高装配效率,又有助于了解单倍型如何影响表型。在这里,我们描述了在组装过程中避免不当倒塌的第一步。特别是,我们描述了ConPADE(重叠群倍数和等位基因剂量估算),一种概率方法,可根据等位基因/等位基因比例估算任何给定重叠群/支架的倍数。在此过程中,我们报告有关测序错误的发现。该方法可用于全基因组shot弹枪(WGS)测序数据。我们还显示了该方法用于变异调用和等位基因剂量估计的适用性。讨论了模拟数据集和实际数据集的结果,并提供了证据,表明只要有足够的测序覆盖范围或真正的重叠群倍数低,ConPADE的性能就很好。我们表明,ConPADE也可能用于相关应用程序,例如鉴定片段装配中的重复基因,尽管需要改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号