...
首页> 外文期刊>BMC Bioinformatics >Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies
【24h】

Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies

机译:清除单倍体:第三代二倍体基因组组装的等位基因重叠群重新分配

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Recent developments in third-gen long read sequencing and diploid-aware assemblers have resulted in the rapid release of numerous reference-quality assemblies for diploid genomes. However, assembly of highly heterozygous genomes is still problematic when regional heterogeneity is so high that haplotype homology is not recognised during assembly. This results in regional duplication rather than consolidation into allelic variants and can cause issues with downstream analysis, for example variant discovery, or haplotype reconstruction using the diploid assembly with unpaired allelic contigs. A new pipeline—Purge Haplotigs—was developed specifically for third-gen sequencing-based assemblies to automate the reassignment of allelic contigs, and to assist in the manual curation of genome assemblies. The pipeline uses a draft haplotype-fused assembly or a diploid assembly, read alignments, and repeat annotations to identify allelic variants in the primary assembly. The pipeline was tested on a simulated dataset and on four recent diploid (phased) de novo assemblies from third-generation long-read sequencing, and compared with a similar tool. After processing with Purge Haplotigs, haploid assemblies were less duplicated with minimal impact on genome completeness, and diploid assemblies had more pairings of allelic contigs. Purge Haplotigs improves the haploid and diploid representations of third-gen sequencing based genome assemblies by identifying and reassigning allelic contigs. The implementation is fast and scales well with large genomes, and it is less likely to over-purge repetitive or paralogous elements compared to alignment-only based methods. The software is available at https://bitbucket.org/mroachawri/purge_haplotigs under a permissive MIT licence.
机译:第三代长读测序和可识别二倍体的装配体的最新发展已导致快速释放用于二倍体基因组的许多参考质量的装配。但是,当区域异质性太高以至于在组装过程中无法识别单倍型同源性时,高度杂合基因组的组装仍然存在问题。这会导致区域重复,而不是整合为等位基因变体,并且可能导致下游分析出现问题,例如,变体发现或使用具有未配对等位基因重叠群的二倍体组装的单倍型重建。专为基于第三代测序的程序集开发了一条新的管线-Purge Haplotigs,以自动进行等位基因重叠群的重新分配,并协助手动管理基因组程序集。管道使用草图单倍型融合装配体或二倍体装配体,读取比对和重复注释来识别初级装配体中的等位基因变体。该管道已在模拟数据集上进行了测试,并在第三代长读测序中使用了四个最新的二倍体(分阶段)从头组装,并与类似工具进行了比较。用Purge Haplotigs处理后,单倍体装配很少重复,对基因组完整性的影响最小,而二倍体装配具有更多的等位基因重叠群配对。 Purge Haplotigs通过鉴定和重新分配等位基因重叠群,改善了基于第三代测序的基因组组装的单倍体和二倍体表示。该实现速度快,并且可以在大型基因组中很好地扩展,并且与仅基于比对的方法相比,它不太可能过度清除重复或旁系同源元件。在许可的MIT许可下,可以从https://bitbucket.org/mroachawri/purge_haplotigs获得该软件。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号