首页> 外文期刊>OMICS: A journal of integrative biology >Habitat-Lite: A GSC case study based on free text terms for environmental metadata
【24h】

Habitat-Lite: A GSC case study based on free text terms for environmental metadata

机译:Habitat-Lite: GSC案例研究基于自由文本环境条件的元数据

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

There is an urgent need to capture metadata on the rapidly growing number of genomic, metagenomic and related sequences, such as 16S ribosomal genes. This need is a major focus within the Genomic Standards Consortium (GSC), and Habitat is a key metadata descriptor in the proposed "Minimum Information about a Genome Sequence" (MIGS) specification. The goal of the work described here is to provide a light-weight, easy-to-use (small) set of terms ("Habitat-Lite") that captures high-level information about habitat while preserving a mapping to the recently launched Environment Ontology (EnvO). Our motivation for building Habitat-Lite is to meet the needs of multiple users, such as annotators curating these data, database providers hosting the data, and biologists and bioinformaticians alike who need to search and employ such data in comparative analyses. Here, we report a case study based on semiautomated identification of terms from GenBank and GOLD. We estimate that the terms in the initial version of Habitat-Lite would provide useful labels for over 60% of the kinds of information found in the GenBank isolation_source field, and around 85% of the terms in the GOLD habitat field. We present a revised version of Habitat-Lite defined within the EnvO Environmental Ontology through a new category, EnvO-Lite-GSC. We invite the community's feedback on its further development to provide a minimum list of terms to capture high-level habitat information and to provide classification bins needed for future studies.
机译:有迫切需要捕获的元数据很快越来越多的基因组,宏基因组和相关序列,如16 s核糖体基因。基因组标准联盟(GSC),和栖息地是一个关键元数据描述符的提议吗“最小基因组序列信息”(董事长)规范。这里描述的是提供一个轻量级的,易于使用的(小)的术语(“Habitat-Lite”)高级信息捕获同时保留一个映射的栖息地最近推出了环境本体(EnvO)。构建Habitat-Lite是我们的动力满足多个用户的需求,例如注释器管理这些数据,数据库承载数据的提供者,生物学家和bioinformaticians都需要搜索和使用这些数据比较分析。我们报告一个案例研究基于半自动的从基因库和黄金的识别。估计初始版本的条款Habitat-Lite将提供有用的标签了类型的信息中发现的60%基因库isolation_source字段和的85%左右黄金的栖息地中的词汇。修订版本的Habitat-Lite内定义通过一个新的EnvO环境本体类别,EnvO-Lite-GSC。在其进一步发展社区的反馈提供一个最低捕获的术语列表高级信息和提供栖息地未来的研究需要分类垃圾箱。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号