首页> 美国卫生研究院文献>Database: The Journal of Biological Databases and Curation >Moving the mountain: analysis of the effort required to transform comparative anatomy into computable anatomy
【2h】

Moving the mountain: analysis of the effort required to transform comparative anatomy into computable anatomy

机译:搬山:分析将比较解剖学转换为可计算解剖学所需的工作

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The diverse phenotypes of living organisms have been described for centuries, and though they may be digitized, they are not readily available in a computable form. Using over 100 morphological studies, the Phenoscape project has demonstrated that by annotating characters with community ontology terms, links between novel species anatomy and the genes that may underlie them can be made. But given the enormity of the legacy literature, how can this largely unexploited wealth of descriptive data be rendered amenable to large-scale computation? To identify the bottlenecks, we quantified the time involved in the major aspects of phenotype curation as we annotated characters from the vertebrate phylogenetic systematics literature. This involves attaching fully computable logical expressions consisting of ontology terms to the descriptions in character-by-taxon matrices. The workflow consists of: (i) data preparation, (ii) phenotype annotation, (iii) ontology development and (iv) curation team discussions and software development feedback. Our results showed that the completion of this work required two person-years by a team of two post-docs, a lead data curator, and students. Manual data preparation required close to 13% of the effort. This part in particular could be reduced substantially with better community data practices, such as depositing fully populated matrices in public repositories. Phenotype annotation required ∼40% of the effort. We are working to make this more efficient with Natural Language Processing tools. Ontology development (40%), however, remains a highly manual task requiring domain (anatomical) expertise and use of specialized software. The large overhead required for data preparation and ontology development contributed to a low annotation rate of approximately two characters per hour, compared with 14 characters per hour when activity was restricted to character annotation. Unlocking the potential of the vast stores of morphological descriptions requires better tools for efficiently processing natural language, and better community practices towards a born-digital morphology.>Database URL:
机译:数百年来,已经描述了多种多样的生物体表型,尽管它们可能已被数字化,但仍不容易以可计算的形式获得。通过超过100项形态学研究,Phenoscape项目证明,通过使用社区本体术语对字符进行注释,可以在新型物种解剖结构和可能构成其基础的基因之间建立联系。但是,鉴于传统文献的庞大性,如何使大量未开发的描述性数据适合大规模计算?为了确定瓶颈,我们在表述脊椎动物系统发生学文献资料的字符时,量化了表型管理主要方面的时间。这涉及将由本体术语组成的完全可计算的逻辑表达式附加到每个字符分类矩阵中的描述中。工作流程包括:(i)数据准备,(ii)表型注释,(iii)本体开发和(iv)策展团队讨论和软件开发反馈。我们的结果表明,由两名博士后,首席数据策展人和学生组成的团队完成这项工作需要两个人年。手动数据准备需要将​​近13%的工作量。尤其是可以通过更好的社区数据实践来大大减少这一部分,例如将完全填充的矩阵存储在公共存储库中。表型注释需要大约40%的工作量。我们正在努力使用自然语言处理工具来提高效率。但是,本体开发(占40%)仍然是一项高度手动的任务,需要领域(解剖)专业知识和专用软件的使用。数据准备和本体开发所需的大量开销导致每小时大约两个字符的低注释率,而将活动限制为字符注释时则为每小时14个字符。要释放大量形态描述存储的潜力,就需要更好的工具来有效处理自然语言,并需要更好的社区实践来生成数字形态。>数据库URL:

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号