首页> 外文会议>Future Technologies Conference >Impact of Context on Keyword Identification and Use in Biomedical Literature Mining
【24h】

Impact of Context on Keyword Identification and Use in Biomedical Literature Mining

机译:背景对生物医学文献矿业的关键词识别和应用的影响

获取原文

摘要

The use of two statistical metrics in automatically identifying important keywords associated with a concept such as a gene by mining scientific literature is reviewed. Starting with a subset of MEDLINE? abstracts that contain the name or synonyms of a gene in their titles, the aforementioned metrics contrast the prevalence of specific words in these documents against a broader "background set" of abstracts. If a word occurs substantially more often in the document subset associated with a gene than in the background set that acts as a reference, then the word is viewed as capturing some specific attribute of the gene. The keywords thus automatically identified may be used as gene features in clustering algorithms. Since the background set is the reference against which keyword prevalence is contrasted, the authors hypothesize that different background document sets can lead to somewhat different sets of keywords to be identified as specific to a gene. Two different background sets are discussed that are useful for two somewhat different purposes, namely, characterizing the function of a gene, and clustering a set of genes based on their shared functional similarities. Experimental results that reveal the significance of the choice of background set are presented.
机译:综述了在自动识别与诸如采矿科学文献中的概念相关的重要关键字的两个统计指标。从一部分亮相开始?包含其标题中基因的名称或同义词的摘要,上述指标对比这些文档中的特定单词的普遍性对比摘要的更广泛的“背景集”。如果在与基因相关联的文档子集中的文献子集中大致更频繁地发生在用作参考的背景集中,则将该单词视为捕获基因的一些特定属性。如此自动识别的关键字可以用作聚类算法中的基因特征。由于背景集是针对哪个关键字患病率对比的参考文献中,作者推测,不同的背景文档集可以导致稍微不同的组的关键字被识别为特定的基因。讨论了两个不同的背景集,其对于两种不同的目的是有用的,即,表征基因的功能,并基于它们的共同功能相似性聚类一组基因。揭示了揭示了背景集选择的重要性的实验结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号