首页> 外文会议>Future Technologies Conference >Impact of Context on Keyword Identification and Use in Biomedical Literature Mining

【24h】

Impact of Context on Keyword Identification and Use in Biomedical Literature Mining

机译：背景对生物医学文献矿业的关键词识别和应用的影响

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The use of two statistical metrics in automatically identifying important keywords associated with a concept such as a gene by mining scientific literature is reviewed. Starting with a subset of MEDLINE? abstracts that contain the name or synonyms of a gene in their titles, the aforementioned metrics contrast the prevalence of specific words in these documents against a broader "background set" of abstracts. If a word occurs substantially more often in the document subset associated with a gene than in the background set that acts as a reference, then the word is viewed as capturing some specific attribute of the gene. The keywords thus automatically identified may be used as gene features in clustering algorithms. Since the background set is the reference against which keyword prevalence is contrasted, the authors hypothesize that different background document sets can lead to somewhat different sets of keywords to be identified as specific to a gene. Two different background sets are discussed that are useful for two somewhat different purposes, namely, characterizing the function of a gene, and clustering a set of genes based on their shared functional similarities. Experimental results that reveal the significance of the choice of background set are presented.

机译：综述了在自动识别与诸如采矿科学文献中的概念相关的重要关键字的两个统计指标。从一部分亮相开始？包含其标题中基因的名称或同义词的摘要，上述指标对比这些文档中的特定单词的普遍性对比摘要的更广泛的“背景集”。如果在与基因相关联的文档子集中的文献子集中大致更频繁地发生在用作参考的背景集中，则将该单词视为捕获基因的一些特定属性。如此自动识别的关键字可以用作聚类算法中的基因特征。由于背景集是针对哪个关键字患病率对比的参考文献中，作者推测，不同的背景文档集可以导致稍微不同的组的关键字被识别为特定的基因。讨论了两个不同的背景集，其对于两种不同的目的是有用的，即，表征基因的功能，并基于它们的共同功能相似性聚类一组基因。揭示了揭示了背景集选择的重要性的实验结果。

著录项

来源
《Future Technologies Conference》|2019年|xiv 1174 pages :|共12页
会议地点
作者
Venu G. Dasigi; Orlando Karam; Sailaja Pydimarri;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP393-532;
关键词
Literature mining; Automatic keyword identification; TF-IDF; Z-score; Background set; Features; Clustering;

机译：文献挖掘;自动关键词识别;TF-IDF;Z分数;背景集;特征;聚类;

相似文献

外文文献
中文文献
专利

1. The Interaction Network Ontology-supported modeling and mining of complex interactions represented with multiple keywords in biomedical literature [J] . Arzucan ?zgür, Junguk Hur, Yongqun He BioData Mining . 2016,第1期

机译：交互网络本体论支持的生物医学文献中由多个关键字表示的复杂交互的建模和挖掘
2. Mining and modeling linkage information from citation context for improving biomedical literature retrieval [J] . Xiaoshi Yin, Jimmy Xiangji Huang, Zhoujun Li Information Processing & Management . 2011,第1期

机译：从引用上下文中挖掘链接信息并对其建模，以改善生物医学文献检索
3. Identification of most influential co-occurring gene suites for gastrointestinal cancer using biomedical literature mining and graph-based influence maximization [J] . Charles C. N. Wang, Jennifer Jin, Jan-Gowth Chang, BMC Medical Informatics and Decision Making . 2020,第1期

机译：使用生物医学文献挖掘和基于图的影响最大化胃肠癌最具影响力的共同发生基因套件
4. Impact of Context on Keyword Identification and Use in Biomedical Literature Mining [C] . Venu G. Dasigi, Orlando Karam, Sailaja Pydimarri Future Technologies Conference . 2019

机译：背景对生物医学文献矿业的关键词识别和应用的影响
5. Text Mining of Mutations and Their Impact from Biomedical Literature [D] . Mahmood, A. S. M. Ashique 2018

机译：基因突变的文本挖掘及其对生物医学文献的影响
6. The Interaction Network Ontology-supported modeling and mining of complex interactions represented with multiple keywords in biomedical literature [O] . Arzucan Özgür, Junguk Hur, Yongqun He 2016

机译：交互网络本体论支持的生物医学文献中由多个关键字表示的复杂交互的建模和挖掘
7. The Interaction Network Ontology-supported modeling and mining of complex interactions represented with multiple keywords in biomedical literature [O] . 2016

机译：交互网络本体论支持的生物医学文献中由多个关键字表示的复杂交互的建模和挖掘
8. Text Mining the Biomedical Literature. [R] . Kostoff, R. N. 2007

机译：文本挖掘生物医学文献。

Impact of Context on Keyword Identification and Use in Biomedical Literature Mining

摘要

著录项

相似文献

相关主题

期刊订阅