首页> 外文期刊>BMC Genomics >Efficient assembly and annotation of the transcriptome of catfish by RNA-Seq analysis of a doubled haploid homozygote
【24h】

Efficient assembly and annotation of the transcriptome of catfish by RNA-Seq analysis of a doubled haploid homozygote

机译:通过双单倍体纯合子的RNA-Seq分析,高效组装和注释of鱼的转录组

获取原文
获取外文期刊封面目录资料

摘要

Background Upon the completion of whole genome sequencing, thorough genome annotation that associates genome sequences with biological meanings is essential. Genome annotation depends on the availability of transcript information as well as orthology information. In teleost fish, genome annotation is seriously hindered by genome duplication. Because of gene duplications, one cannot establish orthologies simply by homology comparisons. Rather intense phylogenetic analysis or structural analysis of orthologies is required for the identification of genes. To conduct phylogenetic analysis and orthology analysis, full-length transcripts are essential. Generation of large numbers of full-length transcripts using traditional transcript sequencing is very difficult and extremely costly. Results In this work, we took advantage of a doubled haploid catfish, which has two sets of identical chromosomes and in theory there should be no allelic variations. As such, transcript sequences generated from next-generation sequencing can be favorably assembled into full-length transcripts. Deep sequencing of the doubled haploid channel catfish transcriptome was performed using Illumina HiSeq 2000 platform, yielding over 300 million high-quality trimmed reads totaling 27 Gbp. Assembly of these reads generated 370,798 non-redundant transcript-derived contigs. Functional annotation of the assembly allowed identification of 25,144 unique protein-encoding genes. A total of 2,659 unique genes were identified as putative duplicated genes in the catfish genome because the assembly of the corresponding transcripts harbored PSVs or MSVs (in the form of pseudo-SNPs in the assembly). Of the 25,144 contigs with unique protein hits, around 20,000 contigs matched 50% length of reference proteins, and over 14,000 transcripts were identified as full-length with complete open reading frames. The characterization of consensus sequences surrounding start codon and the stop codon confirmed the correct assembly of the full-length transcripts. Conclusions The large set of transcripts assembled in this study is the most comprehensive set of genome resources ever developed from catfish, which will provide the much needed resources for functional genome research in catfish, serving as a reference transcriptome for genome annotation, analysis of gene duplication, gene family structures, and digital gene expression analysis. The putative set of duplicated genes provide a starting point for genome scale analysis of gene duplication in the catfish genome, and should be a valuable resource for comparative genome analysis, genome evolution, and genome function studies.
机译:背景技术在完成全基因组测序后,将基因组序列与生物学意义相关联的彻底的基因组注释必不可少。基因组注释取决于转录本信息和拼写信息的可用性。在硬骨鱼类中,基因组复制严重阻碍了基因组注释。由于基因重复,一个人不能仅通过同源性比较来建立直系。鉴定基因需要相当严格的系统发育分析或正交结构分析。要进行系统发育分析和组织学分析,全长转录本是必不可少的。使用传统的转录本测序方法产生大量的全长转录本非常困难,而且成本很高。结果在这项工作中,我们利用了双倍单倍体cat鱼,它具有两组相同的染色体,并且理论上不应存在等位基因变异。这样,由下一代测序产生的转录物序列可以有利地组装成全长转录物。使用Illumina HiSeq 2000平台对双倍单倍体通道channel鱼转录组进行了深度测序,产生了超过3亿条高质量的修剪片段,总计27 Gbp。这些读段的组装产生了370,798个非冗余的转录本衍生的重叠群。大会的功能注释允许识别25,144个独特的蛋白质编码基因。在the鱼基因组中总共鉴定出2659个独特基因为推定的重复基因,因为相应转录本的装配体包含PSV或MSV(装配体中以假SNP的形式)。在25,144个具有独特蛋白命中的重叠群中,大约20,000个重叠群与参考蛋白的50%长度匹配,并且超过14,000个转录物被鉴定为具有完整开放阅读框的全长。围绕起始密码子和终止密码子的共有序列的表征证实了全长转录本的正确组装。结论本研究中组装的大量转录本是ever鱼开发的最全面的基因组资源集,将为cat鱼的功能基因组研究提供急需的资源,作为基因组注释,基因重复分析的参考转录组。 ,基因家族结构和数字基因表达分析。推定的复制基因集为provide鱼基因组中基因复制的基因组规模分析提供了起点,并且应该是比较基因组分析,基因组进化和基因组功能研究的宝贵资源。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号