首页> 美国卫生研究院文献>Bioscience and Microflora >DFAST and DAGA: web-based integrated genome annotation tools and resources
【2h】

DFAST and DAGA: web-based integrated genome annotation tools and resources

机译:DFAST和DAGA:基于Web的集成基因组注释工具和资源

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Quality assurance and correct taxonomic affiliation of data submitted to public sequence databases have been an everlasting problem. The DDBJ Fast Annotation and Submission Tool (DFAST) is a newly developed genome annotation pipeline with quality and taxonomy assessment tools. To enable annotation of ready-to-submit quality, we also constructed curated reference protein databases tailored for lactic acid bacteria. DFAST was developed so that all the procedures required for DDBJ submission could be done seamlessly online. The online workspace would be especially useful for users not familiar with bioinformatics skills. In addition, we have developed a genome repository, DFAST Archive of Genome Annotation (DAGA), which currently includes 1,421 genomes covering 179 species and 18 subspecies of two genera, Lactobacillus and Pediococcus, obtained from both DDBJ/ENA/GenBank and Sequence Read Archive (SRA). All the genomes deposited in DAGA were annotated consistently and assessed using DFAST. To assess the taxonomic position based on genomic sequence information, we used the average nucleotide identity (ANI), which showed high discriminative power to determine whether two given genomes belong to the same species. We corrected mislabeled or misidentified genomes in the public database and deposited the curated information in DAGA. The repository will improve the accessibility and reusability of genome resources for lactic acid bacteria. By exploiting the data deposited in DAGA, we found intraspecific subgroups in Lactobacillus gasseri and Lactobacillus jensenii, whose variation between subgroups is larger than the well-accepted ANI threshold of 95% to differentiate species. DFAST and DAGA are freely accessible at .
机译:提交给公共序列数据库的数据的质量保证和正确的分类隶属关系一直是一个永恒的问题。 DDBJ快速注释和提交工具(DFAST)是新开发的具有质量和分类学评估工具的基因组注释管道。为了能够标注准备提交的质量,我们还构建了针对乳酸菌量身定制的参考蛋白质数据库。开发DFAST是为了使DDBJ提交所需的所有程序都可以在线无缝完成。在线工作空间对于不熟悉生物信息学技能的用户特别有用。此外,我们已经开发了一个基因组资料库,即DFAST基因组注释档案库(DAGA),目前包括从DDBJ / ENA / GenBank和Sequence Read Archive获得的1,421个基因组,涵盖乳杆菌和小球菌两个属的179个物种和18个亚种。 (SRA)。对存放在DAGA中的所有基因组进行一致注释,并使用DFAST进行评估。为了根据基因组序列信息评估分类位置,我们使用了平均核苷酸同一性(ANI),它具有很高的判别力,可以确定两个给定的基因组是否属于同一物种。我们在公共数据库中纠正了标签错误或标识错误的基因组,并将策划的信息保存在DAGA中。该储存库将改善乳酸菌基因组资源的可及性和可重用性。通过利用DAGA中存储的数据,我们在加氏乳杆菌和詹森乳杆菌中发现了种内亚组,它们之间的差异大于公认的95%ANI阈值以区分物种。 DFAST和DAGA可在上免费访问。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号