首页> 外文会议>IEEE International Conference on Healthcare Informatics, Imaging and Systems Biology >SPOT the Drug! An Unsupervised Pattern Matching Method to Extract Drug Names from Very Large Clinical Corpora
【24h】

SPOT the Drug! An Unsupervised Pattern Matching Method to Extract Drug Names from Very Large Clinical Corpora

机译:发现药物!一种无监督的模式匹配方法,从非常大的临床语料库中提取药物名称

获取原文

摘要

Although structured electronic health records are becoming more prevalent, much information about patient health is still recorded only in unstructured text. """"Understanding"""" these texts has been a focus of natural language processing research for many years, with some remarkable successes. Knowing the drugs patients take is not only critical for understanding patient health (e.g., for drug-drug interactions or drug-enzyme interaction), but also for secondary uses, such as research on treatment effectiveness. Several drug dictionaries have been curated, such as RxNorm or FDA's Orange Book, with a focus on prescription drugs. Developing these dictionaries is a challenge, but even more challenging is keeping these dictionaries up-to-date in the face of a rapidly advancing field. To discover other, new adverse drug interactions, a large number of patient histories often need to be examined, necessitating not only accurate but also fast algorithms to identify pharmacological substances. We propose a new algorithm, SPOT, which identifies drug names that can be used as new dictionary entries from a large corpus, where a """"drug"""" is defined as a substance intended for use in the diagnosis, cure, mitigation, treatment, or prevention of disease. Measured against a manually annotated gold-standard corpus, we present precision and recall values for SPOT. SPOT is language and syntax independent, can be run efficiently to keep dictionaries up-to-date and to also suggest words and phrases which may be misspellings or uncatalogued synonyms of a known drug. We show how SPOT's lack of reliance on NLP tools makes it robust in analyzing clinical medical text. SPOT is a generalized bootstrapping algorithm, seeded with a known dictionary and automatically extracting the context within which each drug is mentioned. We define three features of such co- text: support, confidence and prevalence. We present the performance tradeoffs depending on the thresholds chosen for these features.
机译:虽然结构化的电子健康记录正变得越来越普遍,有关患者健康多的信息只在非结构化文本仍然记录。 “”“”了解“”“”这些文本已经自然语言处理研究的重点多年,具有一定的显着成效。知道患者服用的药物不仅对理解患者健康状况(例如,用于药物 - 药物相互作用或药物 - 酶相互作用)至关重要,而且对二次用途,例如治疗效果的研究。一些药物的字典已经被策划,如RxNorm或FDA的橙皮书,重点是处方药。开发这些字典是一个挑战,但更是挑战是保持这些字典在迅速发展的领域的面朝上最新。要发现其他新的不良药物相互作用,要检查大量经常需要患者病史的,因此有必要不仅准确,而且快速的算法,以确定药物的物质。我们提出了一个新的算法,SPOT,识别药品名称,可以从大量语料,其中“”“”药“”“”作为一个物质用于诊断,治疗使用规定作为新的字典项,缓解,治疗或预防疾病。测量针对手动注释黄金标准语料,我们对本SPOT精确度和召回值。 SPOT是语言和语法独立,可以有效地运行,保持了字典,最新和也建议单词和短语,这可能是拼写错误或已知的药物未列入目录的同义词。我们展示SPOT缺乏对NLP工具的依赖,如何使得它在分析临床医书强劲。 SPOT是广义自举算法,接种与已知的字典和自动提取在其中每种药物所提及的上下文。我们定义这样的合作文本的三个特点:支持,信任和流行。我们目前根据选择这些功能的阈值的性能折衷。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号