首页> 外文期刊>Journal of Pathology Informatics >Extracting laboratory test information from biomedical text
【24h】

Extracting laboratory test information from biomedical text

机译:从生物医学文本中提取实验室测试信息

获取原文
           

摘要

Background:No previous study reported the efficacy of current natural language processing (NLP) methods for extracting laboratory test information from narrative documents. This study investigates the pathology informatics question of how accurately such information can be extracted from text with the current tools and techniques, especially machine learning and symbolic NLP methods. The study data came from a text corpus maintained by the U.S. Food and Drug Administration, containing a rich set of information on laboratory tests and test devices.Methods:The authors developed a symbolic information extraction (SIE) system to extract device and test specific information about four types of laboratory test entities: Specimens, analytes, units of measures and detection limits. They compared the performance of SIE and three prominent machine learning based NLP systems, LingPipe, GATE and BANNER, each implementing a distinct supervised machine learning method, hidden Markov models, support vector machines and conditional random fields, respectively.Results:Machine learning systems recognized laboratory test entities with moderately high recall, but low precision rates. Their recall rates were relatively higher when the number of distinct entity values (e.g., the spectrum of specimens) was very limited or when lexical morphology of the entity was distinctive (as in units of measures), yet SIE outperformed them with statistically significant margins on extracting specimen, analyte and detection limit information in both precision and F-measure. Its high recall performance was statistically significant on analyte information extraction.Conclusions:Despite its shortcomings against machine learning methods, a well-tailored symbolic system may better discern relevancy among a pile of information of the same type and may outperform a machine learning system by tapping into lexically non-local contextual information such as the document structure.
机译:背景:以前没有研究报道过当前自然语言处理(NLP)方法从叙述性文档中提取实验室测试信息的功效。这项研究调查了病理信息学问题,即如何使用当前的工具和技术(尤其是机器学习和符号NLP方法)从文本中准确地提取此类信息。研究数据来自美国食品药品监督管理局维护的文本语料库,其中包含有关实验室测试和测试设备的大量信息。方法:作者开发了一种符号信息提取(SIE)系统,用于提取设备并测试特定信息。大约四种类型的实验室测试实体:标本,分析物,度量单位和检测极限。他们比较了SIE和三个基于机器学习的著名NLP系统LingPipe,GATE和BANNER的性能,它们分别实现了独特的监督式机器学习方法,隐马尔可夫模型,支持向量机和条件随机字段。具有较高召回率但准确率较低的实验室测试实体。当不同实体值的数量(例如标本的频谱)非常有限时,或者当实体的词汇形态非常独特时(以度量单位为单位),它们的召回率相对较高,但是SIE在提取样品,分析物和检测限信息,包括精确度和F量度。结论:尽管其在机器学习方法方面存在缺陷,但量身定制的符号系统可以更好地识别大量相同类型的信息之间的相关性,并且通过轻按可以超越机器学习系统。转换成词汇上非本地的上下文信息,例如文档结构。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号