...
首页> 外文期刊>Journal of digital imaging: the official journal of the Society for Computer Applications in Radiology >Unsupervised Topic Modeling in a Large Free Text Radiology Report Repository
【24h】

Unsupervised Topic Modeling in a Large Free Text Radiology Report Repository

机译:大型免费文本放射学报告资料库中的无监督主题建模

获取原文
获取原文并翻译 | 示例

摘要

Radiology report narrative contains a large amount of information about the patient's health and the radiologist's interpretation of medical findings. Most of this critical information is entered in free text format, even when structured radiology report templates are used. The radiology report narrative varies in use of terminology and language among different radiologists and organizations. The free text format and the subtlety and variations of natural language hinder the extraction of reusable information from radiology reports for decision support, quality improvement, and biomedical research. Therefore, as the first step to organize and extract the information content in a large multi-institutional free text radiology report repository, we have designed and developed an unsupervised machine learning approach to capture the main concepts in a radiology report repository and partition the reports based on their main foci. In this approach, radiology reports are modeled in a vector space and compared to each other through a cosine similarity measure. This similarity is used to cluster radiology reports and identify the repository's underlying topics. We applied our approach on a repository of 1,899,482 radiology reports from three major healthcare organizations. Our method identified 19 major radiology report topics in the repository and clustered the reports accordingly to these topics. Our results are verified by a domain expert radiologist and successfully explain the repository's primary topics and extract the corresponding reports. The results of our system provide a target-based corpus and framework for information extraction and retrieval systems for radiology reports.
机译:放射学报告叙述包含有关患者健康和放射科医生对医学发现的解释的大量信息。即使使用结构化放射学报告模板,大多数关键信息也以自由文本格式输入。放射学报告的叙述在不同放射学家和组织之间在术语和语言使用上有所不同。自由文本格式以及自然语言的微妙和变化阻碍了放射学报告为决策支持,质量改进和生物医学研究而提取可重复使用的信息。因此,作为在大型的多机构自由文本放射学报告库中组织和提取信息内容的第一步,我们设计并开发了一种无监督的机器学习方法,以捕获放射学报告库中的主要概念并对报告进行分区在他们的主要焦点上。在这种方法中,放射学报告在向量空间中建模,并通过余弦相似性度量相互比较。这种相似性用于对放射学报告进行聚类并标识存储库的基础主题。我们在来自三个主要医疗组织的1,899,482份放射学报告的资料库中应用了我们的方法。我们的方法在存储库中确定了19个主要的放射学报告主题,并根据这些主题对报告进行了聚类。我们的结果得到了领域专家放射科医生的验证,并成功地解释了存储库的主要主题并提取了相应的报告。我们系统的结果为放射学报告的信息提取和检索系统提供了基于目标的语料库和框架。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号