首页> 美国卫生研究院文献>other >Phrase Based Topic Modeling for Semantic Information Processing in Biomedicine

【2h】

Phrase Based Topic Modeling for Semantic Information Processing in Biomedicine

机译：基于短语的主题模型在生物医学语义信息处理中的应用

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Given that unstructured data is increasing exponentially everyday, extracting and understanding the information, themes, and relationships from large collections of documents is increasingly important to researchers in many disciplines including biomedicine. Latent Dirichlet Allocation (LDA) is an unsupervised topic modeling technique based on the “bag-of-words” assumption that has been applied extensively to unveil hidden semantic themes within large sets of textual documents. Recently, it was extended using the “bag-of-n-grams” paradigm to account for word order. In this paper, we present an alternative phrase based LDA model to move from a bag of words or n-grams paradigm to a “bag-of-key-phrases” setting by applying a key phrase extraction technique, the C-value method, to further explore latent themes. We evaluate our approach by using a phrase intrusion user study and demonstrate that our model can help LDA generate better and more interpretable topics than those generated using the bag-of-n-grams approach. Given topic models essentially are statistical tools, an important problem in topic modeling is that of visualizing and interacting with the models to understand and extract new information from a collection. To evaluate our phrase based modeling approach in this context, we incorporate it in an open source interactive topic browser. Qualitative evaluations of this browser with biomedical experts demonstrate that our approach can aid biomedical researchers gain better and faster understanding of their document collections.

机译：鉴于非结构化数据每天都呈指数级增长，因此从大量文档中提取和理解信息，主题和关系对包括生物医学在内的许多学科的研究人员而言变得越来越重要。潜在狄利克雷分配（LDA）是一种基于“词袋”假设的无监督主题建模技术，已广泛应用于揭示大量文本文档中的隐藏语义主题。最近，它扩展为使用“ n-grams袋”范式来说明单词顺序。在本文中，我们提出了一种基于短语的LDA模型，该模型通过应用关键短语提取技术，C值方法从一袋单词或n-grams范式转变为“袋式短语短语”设置，进一步探索潜在主题。我们通过使用短语入侵用户研究来评估我们的方法，并证明我们的模型比L-grams方法可以帮助LDA生成更好，更可解释的主题。给定主题模型本质上是统计工具，主题建模中的一个重要问题是可视化模型并与模型进行交互以理解和从集合中提取新信息。为了在这种情况下评估基于短语的建模方法，我们将其合并到开源交互式主题浏览器中。生物医学专家对该浏览器的定性评估表明，我们的方法可以帮助生物医学研究人员更好，更快地了解其文献资料。

著录项

期刊名称 other
作者
Zhiguo Yu; Todd R Johnson; Ramakanth Kavuluru;
展开▼
作者单位

展开▼
年(卷),期 -1(2013),-1
年度 -1
页码 440–445
总页数 13
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Finding Semantically Valid and Relevant Topics by Association-Based Topic Selection Model [J] . YANG GAO, YUEFENG LI, RAYMOND Y. K. LAU, ACM transactions on intelligent systems . 2018,第1期

机译：通过基于关联的主题选择模型查找语义有效和相关的主题
2. A Phrase Topic Model Based on Distributed Representation [J] . Jialin Ma, Jieyi Cheng, Lin Zhang, Computers, Materials & Continua . 2020,第1期

机译：基于分布式表示的短语主题模型
3. A Phrase Topic Model Based on Distributed Representation [J] . Journal of neurosurgical sciences . 2020,第1期

机译：基于分布式表示的短语主题模型
4. Phrase Based Topic Modeling for Semantic Information Processing in Biomedicine [C] . Yu Zhiguo, Johnson Todd R., Kavuluru Ramakanth International Conference on Machine Learning and Applications . 2013

机译：基于短语的主题模型在生物医学中的语义信息处理
5. Processing coordinated verb phrases: The relevance of lexical-semantic, conceptual, and contextual information towards establishing verbal parallelism [D] . Tutunjian, Damon A. 2010

机译：处理协调动词短语：词汇语义，概念和上下文信息与建立言语平行性的相关性
6. Using phrases and document metadata to improve topic modeling of clinical reports [O] . William Speier, Michael K. Ong, Corey W. Arnold -1

机译：使用短语和文档元数据来改善临床报告的主题建模
7. Modeling multiword phrases with constrained phrase trees for improved topic modeling of conversational speech [O] . Timothy J. Hazen, Fred Richardson 2012

机译：使用受约束的短语树对多字短语进行建模，以改进对话语音的主题建模

Phrase Based Topic Modeling for Semantic Information Processing in Biomedicine

摘要

著录项

相似文献

相关主题

期刊订阅