首页> 外文期刊>Plant Biotechnology Journal >Orthology Guided Assembly in highly heterozygous crops: creating a reference transcriptome to uncover genetic diversity in Lolium perenne
【24h】

Orthology Guided Assembly in highly heterozygous crops: creating a reference transcriptome to uncover genetic diversity in Lolium perenne

机译:矫正术中的正杂合作作物:创造参考转录组,以发现Lolium Perenne的遗传多样性

获取原文
           

摘要

Despite current advances in next-generation sequencing data analysis procedures, de novo assembly of a reference sequence required for SNP discovery and expression analysis is still a major challenge in genetically uncharacterized, highly heterozygous species. High levels of polymorphism inherent to outbreeding crop species hamper De Bruijn Graph-based de novo assembly algorithms, causing transcript fragmentation and the redundant assembly of allelic contigs. If multiple genotypes are sequenced to study genetic diversity, primary de novo assembly is best performed per genotype to limit the level of polymorphism and avoid transcript fragmentation. Here, we propose an Orthology Guided Assembly procedure that first uses sequence similarity (tBLASTn) to proteins of a model species to select allelic and fragmented contigs from all genotypes and then performs CAP3 clustering on a gene-by-gene basis. Thus, we simultaneously annotate putative orthologues for each protein of the model species, resolve allelic redundancy and fragmentation and create a de novo transcript sequence representing the consensus of all alleles present in the sequenced genotypes. We demonstrate the procedure using RNA-seq data from 14 genotypes of Lolium perenne to generate a reference transcriptome for gene discovery and translational research, to reveal the transcriptome-wide distribution and density of SNPs in an outbreeding crop and to illustrate the effect of polymorphisms on the assembly procedure. The results presented here illustrate that constructing a non-redundant reference sequence is essential for comparative genomics, orthology-based annotation and candidate gene selection but also for read mapping and subsequent polymorphism discovery and/or read count-based gene expression analysis.
机译:尽管下一代测序数据分析程序具有目前的进展,但SNP发现和表达分析所需的参考序列的DE Novo组装仍然是基因上不表达高度杂合种的主要挑战。高水平的多态性含有外交作物物种妨碍了基于Bruijn的De Novo组装算法,导致转录物分段和等位基因折叠的冗余组装。如果对多种基因型进行测序以研究遗传多样性,则每种基因型最佳地进行初级DE Novo组件以限制多态性水平并避免转录分段。在这里,我们提出了一种正非的引导组装程序,首先使用序列相似性(Tblastn)对模型物种的蛋白质来选择来自所有基因型的等位基因和片段化的Contigs,然后在基因基础上进行CAP3聚类。因此,我们同时为模型物种的每种蛋白质提供注释推定的直晶,解决等位基因冗余和碎裂,并产生代表测序基因型中存在的所有等位基因共识的DE Novo转录序列。我们展示了使用来自Lolium Perenne的14个基因型的RNA-SEQ数据的程序,以产生基因发现和翻译研究的参考转录组,以揭示在交叉作物中的转录组范围和SNP的密度,并说明多态性对装配程序。此处呈现的结果表明,构建非冗余参考序列对于对比基因组学,基于原基因的注释和候选基因选择是必不可少的,而且还用于读取映射和随后的多态性发现和/或读取计数基因表达分析。

著录项

相似文献

  • 外文文献
  • 中文文献
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号