...
首页> 外文期刊>Standards in Genomic Sciences >Quality scores for 32,000 genomes
【24h】

Quality scores for 32,000 genomes

机译:32,000个基因组的质量得分

获取原文

摘要

Background More than 80% of the microbial genomes in GenBank are of ‘draft’ quality (12,553 draft vs. 2,679 finished, as of October, 2013). We have examined all the microbial DNA sequences available for complete, draft, and Sequence Read Archive genomes in GenBank as well as three other major public databases, and assigned quality scores for more than 30,000 prokaryotic genome sequences. Results Scores were assigned using four categories: the completeness of the assembly, the presence of full-length rRNA genes, tRNA composition and the presence of a set of 102 conserved genes in prokaryotes. Most (~88%) of the genomes had quality scores of 0.8 or better and can be safely used for standard comparative genomics analysis. We compared genomes across factors that may influence the score. We found that although sequencing depth coverage of over 100x did not ensure a better score, sequencing read length was a better indicator of sequencing quality. With few exceptions, most of the 30,000 genomes have nearly all the 102 essential genes. Conclusions The score can be used to set thresholds for screening data when analyzing “all published genomes” and reference data is either not available or not applicable. The scores highlighted organisms for which commonly used tools do not perform well. This information can be used to improve tools and to serve a broad group of users as more diverse organisms are sequenced. Unexpectedly, the comparison of predicted tRNAs across 15,000 high quality genomes showed that anticodons beginning with an ‘A’ (codons ending with a ‘U’) are almost non-existent, with the exception of one arginine codon (CGU); this has been noted previously in the literature for a few genomes, but not with the depth found here.
机译:背景信息GenBank中超过80%的微生物基因组具有“草稿”质量(截至2013年10月,草稿数量为12553,而完成的微生物数量为2679)。我们检查了可用于GenBank中完整,草稿和序列读取存档基因组的所有微生物DNA序列,以及其他三个主要的公共数据库,并为30,000多个原核基因组序列分配了质量得分。结果分数使用四个类别进行分配:装配体的完整性,全长rRNA基因的存在,tRNA的组成以及原核生物中102个保守基因的存在。大多数(〜88%)基因组的质量得分为0.8或更高,可以安全地用于标准比较基因组分析。我们比较了可能影响得分的各种因素的基因组。我们发现,尽管测序深度覆盖率超过100倍并不能确保获得更好的评分,但是测序读取长度是测序质量的更好指标。除少数例外,30,000个基因组中的大多数几乎都具有102个必需基因。结论在分析“所有公开的基因组”时,该分数可用于设置筛选数据的阈值,并且参考数据不可用或不适用。分数突出显示了常用工具效果不佳的生物。随着更多种类的生物被测序,该信息可用于改进工具并为广大用户服务。出乎意料的是,对15,000个高质量基因组的预测tRNA的比较表明,除一个精氨酸密码子(CGU)外,几乎没有以'A'开头的反密码子(以'U'结尾的密码子);先前已经在文献中针对一些基因组注意到了这一点,但此处没有找到深度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号