首页> 美国卫生研究院文献>other >Fast and accurate semantic annotation of bioassays exploiting a hybrid of machine learning and user confirmation
【2h】

Fast and accurate semantic annotation of bioassays exploiting a hybrid of machine learning and user confirmation

机译:结合机器学习和用户确认的生物测定快速准确的语义注释

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Bioinformatics and computer aided drug design rely on the curation of a large number of protocols for biological assays that measure the ability of potential drugs to achieve a therapeutic effect. These assay protocols are generally published by scientists in the form of plain text, which needs to be more precisely annotated in order to be useful to software methods. We have developed a pragmatic approach to describing assays according to the semantic definitions of the BioAssay Ontology (BAO) project, using a hybrid of machine learning based on natural language processing, and a simplified user interface designed to help scientists curate their data with minimum effort. We have carried out this work based on the premise that pure machine learning is insufficiently accurate, and that expecting scientists to find the time to annotate their protocols manually is unrealistic. By combining these approaches, we have created an effective prototype for which annotation of bioassay text within the domain of the training set can be accomplished very quickly. Well-trained annotations require single-click user approval, while annotations from outside the training set domain can be identified using the search feature of a well-designed user interface, and subsequently used to improve the underlying models. By drastically reducing the time required for scientists to annotate their assays, we can realistically advocate for semantic annotation to become a standard part of the publication process. Once even a small proportion of the public body of bioassay data is marked up, bioinformatics researchers can begin to construct sophisticated and useful searching and analysis algorithms that will provide a diverse and powerful set of tools for drug discovery researchers.
机译:生物信息学和计算机辅助药物设计依赖于大量生物测定实验方案的制定,这些方案可测量潜在药物达到治疗效果的能力。这些测定规程通常由科学家以纯文本形式发布,需要对其进行更精确的注释才能对软件方法有用。我们已经开发出一种实用的方法,可以根据BioAssay Ontology(BAO)项目的语义定义来描述测定方法,它使用了基于自然语言处理的混合机器学习技术,以及旨在帮助科学家以最小的努力来整理其数据的简化的用户界面。我们基于纯粹的机器学习不够准确的前提进行这项工作,并且期望科学家找到时间手动注释其协议是不现实的。通过组合这些方法,我们创建了一个有效的原型,可以非常快速地完成对训练集范围内的生物测定文本的注释。训练有素的注释需要单击用户批准,而训练集范围外的注释可以使用设计良好的用户界面的搜索功能进行标识,然后用于改进基础模型。通过大幅度减少科学家注释化验所需的时间,我们可以现实地倡导语义注释成为出版过程的标准部分。一旦标记了小部分的公共生物测定数据,生物信息学研究人员就可以开始构建复杂而有用的搜索和分析算法,从而为药物发现研究人员提供一系列强大的工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号