...
首页> 外文期刊>BMC Genomics >Inconsistencies of genome annotations in apicomplexan parasites revealed by 5'-end-one-pass and full-length sequences of oligo-capped cDNAs
【24h】

Inconsistencies of genome annotations in apicomplexan parasites revealed by 5'-end-one-pass and full-length sequences of oligo-capped cDNAs

机译:通过5'端一遍和寡核苷酸封端的cDNA的全长序列揭示了apicomplexan寄生虫中基因组注释的不一致

获取原文
           

摘要

Background Apicomplexan parasites are causative agents of various diseases including malaria and have been targets of extensive genomic sequencing. We generated 5'-EST collections for six apicomplexa parasites using our full-length oligo-capping cDNA library method. To improve upon the current genome annotations, as well as to validate the importance for physical cDNA clone resources, we generated a large-scale collection of full-length cDNAs for several apicomplexa parasites. Results In this study, we used a total of 61,056 5'-end-single-pass cDNA sequences from Plasmodium falciparum, P. vivax, P. yoelii, P. berghei, Cryptosporidium parvum, and Toxoplasma gondii. We compared these partially sequenced cDNA sequences with the currently annotated gene models and observed significant inconsistencies between the two datasets. In particular, we found that on average 14% of the exons in the current gene models were not supported by any cDNA evidence, and that 16% of the current gene models may contain at least one mis-annotation and should be re-evaluated. We also identified a large number of transcripts that had been previously unidentified. For 732 cDNAs in T. gondii, the entire sequences were determined in order to evaluate the annotated gene models at the complete full-length transcript level. We found that 41% of the T. gondii gene models contained at least one inconsistency. We also identified and confirmed by RT-PCR 140 previously unidentified transcripts found in the intergenic regions of the current gene annotations. We show that the majority of these discrepancies are due to questionable predictions of one or two extra exons in the upstream or downstream regions of the genes. Conclusion Our data indicates that the current gene models are likely to still be incomplete and have much room for improvement. Our unique full-length cDNA information is especially useful for further refinement of the annotations for the genomes of apicomplexa parasites.
机译:背景蚜虫寄生虫是包括疟疾在内的各种疾病的病原体,并且已成为广泛基因组测序的靶标。我们使用全长寡聚上限cDNA文库方法为六个apiplexplexa寄生虫生成了5'-EST集合。为了改善当前的基因组注释,并验证物理cDNA克隆资源的重要性,我们针对几种蚜虫寄生虫生成了全长cDNA的大规模集合。结果在这项研究中,我们使用了来自恶性疟原虫,间日疟原虫,约氏疟原虫,伯氏疟原虫,小隐孢子虫和弓形虫的共61,056个5'端单通cDNA序列。我们将这些部分测序的cDNA序列与当前注释的基因模型进行了比较,并观察到两个数据集之间存在明显的不一致。特别是,我们发现当前基因模型中平均有14%的外显子不受任何cDNA证据的支持,而当前基因模型中的16%可能包含至少一个错误注释,应重新评估。我们还鉴定了许多以前未鉴定的转录本。对于刚地弓形虫中的732个cDNA,确定了整个序列,以便在完整的全长转录水平上评估带注释的基因模型。我们发现41%的弓形虫基因模型包含至少一种不一致。我们还通过RT-PCR 140鉴定并证实了在当前基因注释的基因间区域中发现的先前未鉴定的转录本。我们表明,这些差异中的大多数是由于对基因的上游或下游区域中一个或两个额外外显子的可疑预测所致。结论我们的数据表明,当前的基因模型可能仍然不完整,还有很大的改进空间。我们独特的全长cDNA信息对于进一步完善apicomplexa寄生虫基因组的注释特别有用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号