...
首页> 外文期刊>Journal of Biomedical Semantics >FlexiTerm: a flexible term recognition method
【24h】

FlexiTerm: a flexible term recognition method

机译:Flexiterm:灵活的术语识别方法

获取原文
           

摘要

Background The increasing amount of textual information in biomedicine requires effective term recognition methods to identify textual representations of domain-specific concepts as the first step toward automating its semantic interpretation. The dictionary look-up approaches may not always be suitable for dynamic domains such as biomedicine or the newly emerging types of media such as patient blogs, the main obstacles being the use of non-standardised terminology and high degree of term variation. Results In this paper, we describe FlexiTerm, a method for automatic term recognition from a domain-specific corpus, and evaluate its performance against five manually annotated corpora. FlexiTerm performs term recognition in two steps: linguistic filtering is used to select term candidates followed by calculation of termhood, a frequency-based measure used as evidence to qualify a candidate as a term. In order to improve the quality of termhood calculation, which may be affected by the term variation phenomena, FlexiTerm uses a range of methods to neutralise the main sources of variation in biomedical terms. It manages syntactic variation by processing candidates using a bag-of-words approach. Orthographic and morphological variations are dealt with using stemming in combination with lexical and phonetic similarity measures. The method was evaluated on five biomedical corpora. The highest values for precision (94.56%), recall (71.31%) and F-measure (81.31%) were achieved on a corpus of clinical notes. Conclusions FlexiTerm is an open-source software tool for automatic term recognition. It incorporates a simple term variant normalisation method. The method proved to be more robust than the baseline against less formally structured texts, such as those found in patient blogs or medical notes. The software can be downloaded freely at http://www.cs.cf.ac.uk/flexiterm webcite.
机译:背景技术生物医生中的越来越多的文本信息需要有效的术语识别方法,以识别域特定概念的文本表示作为自动化其语义解释的第一步。字典查询方法可能并不总是适用于生物医学的动态域或诸如患者博客的新出现类型的媒体,主要障碍是使用非标准化术语和高度的术语变化。结果在本文中,我们描述了FlexIterm,一种用于自动术语识别的方法,用于从域特定的语料库中识别,并评估其针对五个手动注释的语料库的性能。 FlexIterm以两个步骤执行术语识别:语言滤波用于选择术语候选,然后计算任期,作为证据符合候选人作为术语的基于频率的措施。为了提高可能受术语变异现象影响的任期计算的质量,FlexIterm使用一系列方法来中和生物医学术语的主要变化来源。它通过使用袋式方法处理候选者来管理句法变化。用词汇和语音相似性措施结合使用茎来处理正交和形态变化。该方法在五个生物医学基础上进行了评估。精度的最高值(94.56%),召回(71.31%)和F测量(81.31%)在临床票据的核心中获得。结论FlexIterm是一种用于自动术语识别的开源软件工具。它包含一种简单的术语变体归一化方法。该方法被证明比基线对较少的正式结构化文本更强大,例如患者博客或医疗票据中的那些。可以在http://www.cs.cf.ac.uk/flexiterm webcite上自由下载该软件。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号