首页> 美国卫生研究院文献>other >Using phrases and document metadata to improve topic modeling of clinical reports

【2h】

Using phrases and document metadata to improve topic modeling of clinical reports

机译：使用短语和文档元数据来改善临床报告的主题建模

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Probabilistic topic models provide an unsupervised method for analyzing unstructured text, which have the potential to be integrated into clinical automatic summarization systems. Clinical documents are accompanied by metadata in a patient’s medical history and frequently contains multiword concepts that can be valuable for accurately interpreting the included text. While existing methods have attempted to address these problems individually, we present a unified model for free-text clinical documents that integrates contextual patient- and document-level data, and discovers multi-word concepts. In the proposed model, phrases are represented by chained n-grams and a Dirichlet hyper-parameter is weighted by both document-level and patient-level context. This method and three other Latent Dirichlet allocation models were fit to a large collection of clinical reports. Examples of resulting topics demonstrate the results of the new model and the quality of the representations are evaluated using empirical log likelihood. The proposed model was able to create informative prior probabilities based on patient and document information, and captured phrases that represented various clinical concepts. The representation using the proposed model had a significantly higher empirical log likelihood than the compared methods. Integrating document metadata and capturing phrases in clinical text greatly improves the topic representation of clinical documents. The resulting clinically informative topics may effectively serve as the basis for an automatic summarization system for clinical reports.

机译：概率主题模型提供了一种用于分析非结构化文本的无监督方法，该方法有可能集成到临床自动摘要系统中。临床文档随附患者病史中的元数据，并且经常包含多词概念，这些概念对于准确地解释所包含的文本可能很有用。尽管现有方法试图单独解决这些问题，但我们为自由文本临床文档提供了一个统一的模型，该模型集成了上下文患者和文档级别的数据，并发现了多词概念。在提出的模型中，短语由链接的n元语法表示，而Dirichlet超参数由文档级和患者级上下文加权。该方法和其他三个潜在Dirichlet分配模型适合大量临床报告。结果主题的示例演示了新模型的结果，并使用经验对数可能性评估了表示的质量。所提出的模型能够基于患者和文档信息以及代表各种临床概念的捕获短语来创建信息丰富的先验概率。使用所提出的模型进行表示的经验对数可能性明显高于比较方法。集成文档元数据和捕获临床文本中的短语可以大大改善临床文档的主题表示。由此产生的临床信息主题可以有效地用作临床报告自动摘要系统的基础。

著录项

期刊名称 other
作者
William Speier; Michael K. Ong; Corey W. Arnold;
展开▼
作者单位

展开▼
年(卷),期 -1(61),-1
年度 -1
页码 260–266
总页数 20
原文格式 PDF
正文语种
中图分类
关键词
Topic modeling LDA n-grams document metadata;

机译：主题建模;LDA;n-gram;文档元数据;

相似文献

外文文献
中文文献
专利

1. Improving topic modeling through homophily for legal documents [J] . Kazuki Ashihara, Cheikh Brahim El Vaigh, Chenhui Chu, Applied Network Science . 2020,第1期

机译：通过同意为法律文件改进主题建模
2. Evaluation of family history information within clinical documents and adequacy of HL7 clinical statement and clinical genomics family history models for its representation: a case report. [J] . Melton GB, Raman N, Chen ES, Journal of the American Medical Informatics Association : . 2010,第3期

机译：评估临床文献中的家族史信息以及HL7临床陈述和临床基因组学家族史模型对其代表性的适当性：一例病例报告。
3. Biomechanical Modeling to Improve Coronary Artery Bifurcation Stenting Expert Review Document on Techniques and Clinical Implementation [J] . Antoniadis Antonios P., Mortier Peter, Kassab Ghassan, JACC. Cardiovascular interventions . 2015,第10期

机译：改善冠状动脉分叉支架的生物力学模型专家审查文件的技术和临床实施
4. Modeling multiword phrases with constrained phrase trees for improved topic modeling of conversational speech [C] . Hazen Timothy J., Richardson Fred 2012 IEEE Workshop on Spoken Language Technology. . 2012

机译：使用约束短语树对多词短语进行建模，以改进会话语音的主题建模
5. Effective Classification of Clinical Reports: Natural Language Processing-Based and Topic Modeling-Based Approaches. [D] . Sarioglu, Efsun Selin. 2014

机译：临床报告的有效分类：基于自然语言处理和基于主题建模的方法。
6. Improved Identification of Noun Phrases in Clinical Radiology Reports Using a High-Performance Statistical Natural Language Parser Augmented with the UMLS Specialist Lexicon [O] . Yang Huang, Henry J. Lowe, Dan Klein, 2005

机译：使用高性能统计自然语言解析器和UMLS专家词典增强了临床放射学报告中名词短语的识别度
7. Modeling multiword phrases with constrained phrase trees for improved topic modeling of conversational speech [O] . Timothy J. Hazen, Fred Richardson 2012

机译：使用受约束的短语树对多字短语进行建模，以改进对话语音的主题建模

Using phrases and document metadata to improve topic modeling of clinical reports

摘要

著录项

相似文献

相关主题

期刊订阅