首页> 美国卫生研究院文献>Database: The Journal of Biological Databases and Curation >Centralizing content and distributing labor: a community model for curating the very long tail of microbial genomes
【2h】

Centralizing content and distributing labor: a community model for curating the very long tail of microbial genomes

机译:集中内容并分配劳力:用于管理微生物基因组长尾巴的社区模型

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The last 20 years of advancement in sequencing technologies have led to sequencing thousands of microbial genomes, creating mountains of genetic data. While efficiency in generating the data improves almost daily, applying meaningful relationships between taxonomic and genetic entities on this scale requires a structured and integrative approach. Currently, knowledge is distributed across a fragmented landscape of resources from government-funded institutions such as National Center for Biotechnology Information (NCBI) and UniProt to topic-focused databases like the ODB3 database of prokaryotic operons, to the supplemental table of a primary publication. A major drawback to large scale, expert-curated databases is the expense of maintaining and extending them over time. No entity apart from a major institution with stable long-term funding can consider this, and their scope is limited considering the magnitude of microbial data being generated daily. Wikidata is an openly editable, semantic web compatible framework for knowledge representation. It is a project of the Wikimedia Foundation and offers knowledge integration capabilities ideally suited to the challenge of representing the exploding body of information about microbial genomics. We are developing a microbial specific data model, based on Wikidata’s semantic web compatibility, which represents bacterial species, strains and the gene and gene products that define them. Currently, we have loaded 43 694 gene and 37 966 protein items for 21 species of bacteria, including the human pathogenic bacteria Chlamydia trachomatis. Using this pathogen as an example, we explore complex interactions between the pathogen, its host, associated genes, other microbes, disease and drugs using the Wikidata SPARQL endpoint. In our next phase of development, we will add another 99 bacterial genomes and their gene and gene products, totaling ∼900,000 additional entities. This aggregation of knowledge will be a platform for community-driven collaboration, allowing the networking of microbial genetic data through the sharing of knowledge by both the data and domain expert.
机译:测序技术的最近20年发展已导致对数千个微生物基因组进行测序,从而产生了大量的遗传数据。尽管生成数据的效率几乎每天都在提高,但要在这种规模上应用生物分类学和遗传实体之间有意义的关系,就需要一种结构化且综合的方法。当前,知识分布在分散的资源中,从政府资助的机构(例如国家生物技术信息中心(NCBI)和UniProt)到以主题为中心的数据库(例如ODB3原核操纵子数据库),再到主要出版物的补充表。大型,专家管理的数据库的主要缺点是,随着时间的推移维护和扩展数据库的开销很大。除了拥有稳定长期资金的大型机构以外,没有任何实体可以考虑这一点,并且考虑到每天生成的微生物数据的数量,其范围受到限制。 Wikidata是一个开放式可编辑的,与语义Web兼容的知识表示框架。它是Wikimedia Foundation的一个项目,并提供知识集成功能,非常适合应对代表微生物基因组学信息爆炸式增长的挑战。我们正在基于Wikidata的语义网兼容性开发特定于微生物的数据模型,该模型代表细菌物种,菌株以及定义它们的基因和基因产物。目前,我们已为21种细菌(包括人类致病菌沙眼衣原体)装载了43694个基因和37966个蛋白质。以这种病原体为例,我们使用Wikidata SPARQL端点探索病原体,其宿主,相关基因,其他微生物,疾病和药物之间的复杂相互作用。在下一阶段的开发中,我们将再添加99个细菌基因组及其基因和基因产物,总共增加约90万个实体。这种知识的积累将成为社区驱动的协作的平台,允许微生物遗传数据通过数据和领域专家之间的知识共享而联网。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号