首页> 美国卫生研究院文献>other >Using phrases and document metadata to improve topic modeling of clinical reports
【2h】

Using phrases and document metadata to improve topic modeling of clinical reports

机译:使用短语和文档元数据来改善临床报告的主题建模

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Probabilistic topic models provide an unsupervised method for analyzing unstructured text, which have the potential to be integrated into clinical automatic summarization systems. Clinical documents are accompanied by metadata in a patient’s medical history and frequently contains multiword concepts that can be valuable for accurately interpreting the included text. While existing methods have attempted to address these problems individually, we present a unified model for free-text clinical documents that integrates contextual patient- and document-level data, and discovers multi-word concepts. In the proposed model, phrases are represented by chained n-grams and a Dirichlet hyper-parameter is weighted by both document-level and patient-level context. This method and three other Latent Dirichlet allocation models were fit to a large collection of clinical reports. Examples of resulting topics demonstrate the results of the new model and the quality of the representations are evaluated using empirical log likelihood. The proposed model was able to create informative prior probabilities based on patient and document information, and captured phrases that represented various clinical concepts. The representation using the proposed model had a significantly higher empirical log likelihood than the compared methods. Integrating document metadata and capturing phrases in clinical text greatly improves the topic representation of clinical documents. The resulting clinically informative topics may effectively serve as the basis for an automatic summarization system for clinical reports.
机译:概率主题模型提供了一种用于分析非结构化文本的无监督方法,该方法有可能集成到临床自动摘要系统中。临床文档随附患者病史中的元数据,并且经常包含多词概念,这些概念对于准确地解释所包含的文本可能很有用。尽管现有方法试图单独解决这些问题,但我们为自由文本临床文档提供了一个统一的模型,该模型集成了上下文患者和文档级别的数据,并发现了多词概念。在提出的模型中,短语由链接的n元语法表示,而Dirichlet超参数由文档级和患者级上下文加权。该方法和其他三个潜在Dirichlet分配模型适合大量临床报告。结果主题的示例演示了新模型的结果,并使用经验对数可能性评估了表示的质量。所提出的模型能够基于患者和文档信息以及代表各种临床概念的捕获短语来创建信息丰富的先验概率。使用所提出的模型进行表示的经验对数可能性明显高于比较方法。集成文档元数据和捕获临床文本中的短语可以大大改善临床文档的主题表示。由此产生的临床信息主题可以有效地用作临床报告自动摘要系统的基础。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号