首页> 外文期刊>Genome Biology >Separating homeologs by phasing in the tetraploid wheat transcriptome.
【24h】

Separating homeologs by phasing in the tetraploid wheat transcriptome.

机译:通过逐步进入四倍体小麦转录组来分离同源基因。

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Background: The high level of identity among duplicated homoeologous genomes in tetraploid pasta wheat presents substantial challenges for de novo transcriptome assembly. To solve this problem, we develop a specialized bioinformatics workflow that optimizes transcriptome assembly and separation of merged homoeologs. To evaluate our strategy, we sequence and assemble the transcriptome of one of the diploid ancestors of pasta wheat, and compare both assemblies with a benchmark set of 13,472 full-length, non-redundant bread wheat cDNAs. Results: A total of 489 million 100 bp paired-end reads from tetraploid wheat assemble in 140,118 contigs, including 96% of the benchmark cDNAs. We used a comparative genomics approach to annotate 66,633 open reading frames. The multiple k-mer assembly strategy increases the proportion of cDNAs assembled full-length in a single contig by 22% relative to the best single k-mer size. Homoeologs are separated using a post-assembly pipeline that includes polymorphism identification, phasing of SNPs, read sorting, and re-assembly of phased reads. Using a reference set of genes, we determine that 98.7% of SNPs analyzed are correctly separated by phasing. Conclusions: Our study shows that de novo transcriptome assembly of tetraploid wheat benefit from multiple k-mer assembly strategies more than diploid wheat. Our results also demonstrate that phasing approaches originally designed for heterozygous diploid organisms can be used to separate the close homoeologous genomes of tetraploid wheat. The predicted tetraploid wheat proteome and gene models provide a valuable tool for the wheat research community and for those interested in comparative genomic studies.
机译:背景:四倍体面食小麦中重复同源基因组之间的高度同一性为从头转录组装配提出了重大挑战。为解决此问题,我们开发了一种专门的生物信息学工作流程,可优化转录组组装和合并同源物的分离。为了评估我们的策略,我们对面食小麦的二倍体祖先之一的转录组进行了测序和组装,然后将这两个组装体与13,472个全长,非冗余面包小麦cDNA的基准集进行比较。结果:在140,118个重叠群中,从四倍体小麦中总共获得了4.89亿个100 bp配对末端读段,包括96%的基准cDNA。我们使用比较基因组学方法对66,633个开放阅读框进行注释。相对于最佳单个k聚体大小,多k聚体装配策略使单个重叠群中全长装配的cDNA的比例增加22%。使用后组装流水线分离同源物,该组装后流水线包括多态性识别,SNP的定相,读物排序和相读物的重新组装。使用一组参考基因,我们确定通过分阶段正确分离了98.7%的SNP。结论:我们的研究表明,四倍体小麦的从头转录组装配比二倍体小麦受益于多种k-mer装配策略。我们的结果还表明,最初为杂合二倍体生物设计的定相方法可用于分离四倍体小麦的紧密同源基因组。预测的四​​倍体小麦蛋白质组和基因模型为小麦研究界和对比较基因组研究感兴趣的人们提供了宝贵的工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号