首页> 美国卫生研究院文献>Journal of Digital Imaging >Unsupervised Topic Modeling in a Large Free Text Radiology Report Repository
【2h】

Unsupervised Topic Modeling in a Large Free Text Radiology Report Repository

机译:大型自由文本放射学报告资料库中的无监督主题建模

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Radiology report narrative contains a large amount of information about the patient’s health and the radiologist’s interpretation of medical findings. Most of this critical information is entered in free text format, even when structured radiology report templates are used. The radiology report narrative varies in use of terminology and language among different radiologists and organizations. The free text format and the subtlety and variations of natural language hinder the extraction of reusable information from radiology reports for decision support, quality improvement, and biomedical research. Therefore, as the first step to organize and extract the information content in a large multi-institutional free text radiology report repository, we have designed and developed an unsupervised machine learning approach to capture the main concepts in a radiology report repository and partition the reports based on their main foci. In this approach, radiology reports are modeled in a vector space and compared to each other through a cosine similarity measure. This similarity is used to cluster radiology reports and identify the repository’s underlying topics. We applied our approach on a repository of 1,899,482 radiology reports from three major healthcare organizations. Our method identified 19 major radiology report topics in the repository and clustered the reports accordingly to these topics. Our results are verified by a domain expert radiologist and successfully explain the repository’s primary topics and extract the corresponding reports. The results of our system provide a target-based corpus and framework for information extraction and retrieval systems for radiology reports.
机译:放射学报告叙述包含有关患者健康和放射科医生对医学发现的解释的大量信息。即使使用结构化放射学报告模板,大多数关键信息也以自由文本格式输入。放射学报告的叙述在不同放射学家和组织之间在术语和语言的使用上有所不同。自由文本格式以及自然语言的微妙和变体阻碍了从放射学报告中提取可重复使用的信息,以用于决策支持,质量改进和生物医学研究。因此,作为在大型多机构自由文本放射学报告库中组织和提取信息内容的第一步,我们设计并开发了一种无监督的机器学习方法,以捕获放射学报告库中的主要概念并对报告进行分区在他们的主要焦点上。在这种方法中,放射学报告在向量空间中建模,并通过余弦相似性度量相互比较。这种相似性用于对放射学报告进行聚类,并确定存储库的基础主题。我们将我们的方法应用于来自三个主要医疗组织的1,899,482份放射学报告的资料库中。我们的方法在存储库中确定了19个主要的放射学报告主题,并根据这些主题将报告聚类。我们的结果经过领域专家放射科医生的验证,并成功解释了存储库的主要主题并提取了相应的报告。我们系统的结果为放射学报告的信息提取和检索系统提供了基于目标的语料库和框架。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号