...
首页> 外文期刊>BMC Bioinformatics >A tri-tuple coordinate system derived for fast and accurate analysis of the colored de Bruijn graph-based pangenomes
【24h】

A tri-tuple coordinate system derived for fast and accurate analysis of the colored de Bruijn graph-based pangenomes

机译:用于快速准确地分析基于彩色的Bruijn图形的Pangenomes的三元组坐标系

获取原文
           

摘要

With the rapid development of accurate sequencing and assembly technologies, an increasing number of high-quality chromosome-level and haplotype-resolved assemblies of genomic sequences have been derived, from which there will be great opportunities for computational pangenomics. Although genome graphs are among the most useful models for pangenome representation, their structural complexity makes it difficult to present genome information intuitively, such as the linear reference genome. Thus, efficiently and accurately analyzing the genome graph spatial structure and coordinating the information remains a substantial challenge. We developed a new method, a colored superbubble (cSupB), that can overcome the complexity of graphs and organize a set of species- or population-specific haplotype sequences of interest. Based on this model, we propose a tri-tuple coordinate system that combines an offset value, topological structure and sample information. Additionally, cSupB provides a novel method that utilizes complete topological information and efficiently detects small indels (?50?bp) for highly similar samples, which can be validated by simulated datasets. Moreover, we demonstrated that cSupB can adapt to the complex cycle structure. Although the solution is made suitable for increasingly complex genome graphs by relaxing the constraint, the directed acyclic graph, the motif cSupB and the cSupB method can be extended to any colored directed acyclic graph. We anticipate that our method will facilitate the analysis of individual haplotype variants and population genomic diversity. We have developed a C? ?program for implementing our method that is available at https://github.com/eggleader/cSupB .
机译:随着精确测序和装配技术的快速发展,已经推出了越来越多的基因组序列的高质量染色体和单倍型分离的组件,从中将有很大的计算植物学的机会。尽管基因组图是Pangenome表示的最有用模型之一,但它们的结构复杂性使得难以直观地呈现基因组信息,例如线性参考基因组。因此,有效准确地分析基因组图空间结构并协调信息仍然是一个大量挑战。我们开发了一种新的方法,一种彩色超泡(CSUPB),可以克服图表的复杂性,并组织一组物种或群体特异性单倍型序列的感兴趣。基于该模型,我们提出了一个三元组坐标系,它结合了偏移值,拓扑结构和样本信息。另外,CSUPB提供了一种新的方法,该方法利用完整的拓扑信息,有效地检测用于高相似的样本的小凹凸(&Δ50≤bp),这可以通过模拟数据集进行验证。此外,我们证明CSUPB可以适应复杂的循环结构。尽管通过松弛约束,所指示的非循环图,图案方法和CSUPB方法可以扩展到越来越复杂的基因组图。我们预计我们的方法将促进对个体单倍型变异和种群基因组多样性的分析。我们开发了一个c? ?实现我们在https://github.com/ggleader/csupb上使用的方法的程序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号