首页> 美国卫生研究院文献>Genomics Informatics >Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data
【2h】

Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data

机译:改进CONTES方法以(几乎)没有训练数据的本体论概念标准化生物医学文本实体

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Entity normalization, or entity linking in the general domain, is an information extraction task that aims to annotate/bind multiple words/expressions in raw text with semantic references, such as concepts of an ontology. An ontology consists minimally of a formally organized vocabulary or hierarchy of terms, which captures knowledge of a domain. Presently, machine-learning methods, often coupled with distributional representations, achieve good performance. However, these require large training datasets, which are not always available, especially for tasks in specialized domains. CONTES (CONcept-TErm System) is a supervised method that addresses entity normalization with ontology concepts using small training datasets. CONTES has some limitations, such as it does not scale well with very large ontologies, it tends to overgeneralize predictions, and it lacks valid representations for the out-of-vocabulary words. Here, we propose to assess different methods to reduce the dimensionality in the representation of the ontology. We also propose to calibrate parameters in order to make the predictions more accurate, and to address the problem of out-of-vocabulary words, with a specific method.
机译:实体规范化或一般领域中的实体链接是一种信息提取任务,旨在用语义引用(例如本体概念)注释/绑定原始文本中的多个单词/表达式。本体最少由形式化的词汇或术语层次组成,这些词汇或层次捕获了领域的知识。目前,机器学习方法(通常与分布表示法结合使用)可实现良好的性能。但是,这些方法需要庞大的训练数据集,而这些数据集并不总是可用,尤其是对于专业领域中的任务而言。 CONTES(概念终端系统)是一种受监督的方法,可以使用小型训练数据集通过本体概念解决实体归一化问题。 CONTES有一些局限性,例如无法在非常大的本体上很好地扩展,它倾向于过于笼统地预测,并且对于词汇以外的单词缺乏有效的表示。在这里,我们建议评估不同的方法以减少本体表示的维数。我们还建议校准参数,以使预测更准确,并使用特定方法解决词汇量不足的问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号