首页> 外文期刊>Expert systems with applications >A hybrid model for finding abbreviation-definition pairs from biomedical abstracts using heuristics-based sequence labeling and perceptron linear classifier
【24h】

A hybrid model for finding abbreviation-definition pairs from biomedical abstracts using heuristics-based sequence labeling and perceptron linear classifier

机译:基于启发式的序列标记和Perceptron线性分类器从生物医学摘要找到缩写定义对的混合模型

获取原文
获取原文并翻译 | 示例

摘要

Automatic extraction of abbreviation and its definition from free format text is a constructive task in text mining. The previous work pertinent to automatic abbreviation/definition extraction from text followed either heuristics or machine learning approach. This paper proposes a hybrid model to identify abbreviation definition pairs from biomedical text. The proposed system uses two approaches i) To identify abbreviation-definition pairs, pattern matching is done through sequence labeling based on the heuristics approach. Three mapping strategies such as Predecessor Term Mapping, Word Level Mapping, and Character Level Mapping are used in sequence labeling tasks. ii) To validate the identified abbreviation-definition pair, an ANN-based approach such as a single layer neural network (perceptron) is used in this work. PubMed biomedical abstracts are utilized as a data source to find the abbreviation-definition pairs. The system performance is analyzed across six different entities in biomedical abstracts. The experiment result shows that our model achieves precision of 96.2%, recall of 92.4%, and F1 of 94.6%. To cross-validate the system performance, the proposed model is validated by using two corpuses AB3P and BioADI, the outcomes of which are discussed in the results section.
机译:自动提取缩写及其从自由格式文本的定义是文本挖掘中的建设性任务。与自动缩写/定义提取的以前的工作从文本中提取,然后是启发式或机器学习方法。本文提出了一种混合模型来识别生物医学文本的缩写定义对。所提出的系统使用两种方法I)来识别缩写定义对,通过基于启发式方法的序列标记来完成模式匹配。在序列标记任务中使用三个映射策略,例如前任术语映射,单词级映射和字符级映射。 ii)为了验证所识别的缩写定义对,在这项工作中使用了基于ANN的方法,例如单层神经网络(Perceptron)。 PubMed生物医学摘要用作数据源以查找缩写定义对。在生物医学摘要中的六个不同实体上分析了系统性能。实验结果表明,我们的模型实现了96.2%的精确度,召回的92.4%,F1为94.6%。为了交叉验证系统性能,通过使用两个语料库和Biadi来验证所提出的模型,结果部分在结果部分中讨论的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号