首页> 美国卫生研究院文献>Systematic Biology >Toward Synthesizing Our Knowledge of Morphology: Using Ontologies and Machine Reasoning to Extract Presence/Absence Evolutionary Phenotypes across Studies
【2h】

Toward Synthesizing Our Knowledge of Morphology: Using Ontologies and Machine Reasoning to Extract Presence/Absence Evolutionary Phenotypes across Studies

机译:致力于综合我们的形态学知识:使用本体论和机器推理来提取研究中存在/不存在的进化表型

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The reality of larger and larger molecular databases and the need to integrate data scalably have presented a major challenge for the use of phenotypic data. Morphology is currently primarily described in discrete publications, entrenched in noncomputer readable text, and requires enormous investments of time and resources to integrate across large numbers of taxa and studies. Here we present a new methodology, using ontology-based reasoning systems working with the Phenoscape Knowledgebase (KB; ), to automatically integrate large amounts of evolutionary character state descriptions into a synthetic character matrix of neomorphic (presence/absence) data. Using the KB, which includes more than 55 studies of sarcopterygian taxa, we generated a synthetic supermatrix of 639 variable characters scored for 1051 taxa, resulting in over 145,000 populated cells. Of these characters, over 76% were made variable through the addition of inferred presence/absence states derived by machine reasoning over the formal semantics of the source ontologies. Inferred data reduced the missing data in the variable character-subset from 98.5% to 78.2%. Machine reasoning also enables the isolation of conflicts in the data, that is, cells where both presence and absence are indicated; reports regarding conflicting data provenance can be generated automatically. Further, reasoning enables quantification and new visualizations of the data, here for example, allowing identification of character space that has been undersampled across the fin-to-limb transition. The approach and methods demonstrated here to compute synthetic presence/absence supermatrices are applicable to any taxonomic and phenotypic slice across the tree of life, providing the data are semantically annotated. Because such data can also be linked to model organism genetics through computational scoring of phenotypic similarity, they open a rich set of future research questions into phenotype-to-genome relationships.
机译:越来越大的分子数据库的现实以及可伸缩地集成数据的需求,对表型数据的使用提出了重大挑战。当前,形态学主要在离散出版物中进行描述,并以非计算机可读文本为根基,并且需要大量时间和资源投入才能在大量分类单元和研究中进行整合。在这里,我们提出了一种新方法,该方法使用基于本体的推理系统与Phenoscape知识库(KB;)一起,将大量的进化字符状态描述自动集成到新态(存在/不存在)数据的合成字符矩阵中。使用包含超过55个翼翅类生物分类研究的知识库,我们生成了由639个可变字符组成的合成超级矩阵,评分为1051个分类单元,从而产生了超过145,000个居住单元。在这些字符中,超过76%的变量通过添加由机器推理在源本体的形式语义上推论的存在/不存在状态而变得可变。推断的数据将可变字符子集中的丢失数据从98.5%减少到78.2%。机器推理还可以隔离数据中的冲突,即,指示存在和不存在的单元;可以自动生成有关数据来源冲突的报告。此外,推理使得能够对数据进行量化和新的可视化,例如,在这里,允许识别在鳍到肢的过渡过程中采样不足的字符空间。此处演示的计算合成存在/不存在超矩阵的方法和方法适用于生命之树上的任何分类学和表型切片,前提是对数据进行语义注释。由于此类数据也可以通过表型相似性的计算评分与模型生物遗传相关联,因此它们为表型与基因组之间的关系打开了一系列丰富的未来研究问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号