首页> 外文学位 >Text mining biomedical literature for improving MEDLINE retrieval.
【24h】

Text mining biomedical literature for improving MEDLINE retrieval.

机译:文本挖掘生物医学文献,以改善MEDLINE检索。

获取原文
获取原文并翻译 | 示例

摘要

A major problem faced in biomedical informatics involves how best to present information retrieval results. This dissertation developed an approach that present users with reduced sets of relevant citations together with topic label. A text mining system is designed to group the retrieved citations, rank the citations in each cluster, and generate a set of keywords and MeSH terms to describe the common theme of each cluster.;A series of follow-up researches were conducted for better performance of the system. A spectral analysis clustering method was proposed based on the content similarity network techniques for information retrieval systems. The new approach organizes all these search results into categories intelligently. Our experimental results demonstrated that the presented method performs well in real world applications.;Automated concept recognition for each cluster is one of the important tasks in our text mining system. The system can perform keyword, key MeSH term and key noun-phrase extraction. Within each cluster, the extraction of keyword and key MeSH term is based on modeling the document-term-matrix as a weighted bipartite graph. A mutual reinforcement principle is used to rank the terms. Our new key noun-phrase extraction method is based on the context-free grammatical rules extracted from the input documents. An existing algorithm called Sequitur is used for constructing the context-free grammar rules that re-represent a sequence as a hierarchical structure. Noun-phrases are extracted from the grammatical rules. The key noun-phrases were identified from top frequency rules without extracting all the grammatical rules. The experimental results showed that our key noun-phrase extraction method is effective in identifying key concepts from documents, and outperforms current widely-used methods.;We also explored to rank MEDLINE citations using an existing web ranking algorithm, HITS (Hyperlink-Induced Topic Search) algorithm. We further extended HITS to supervised HITS to rank citations. Our results showed that supervised HITS algorithm significantly outperforms HITS algorithm (p0.01). Compared with HITS, supervised HITS can improve citation ranking from 15% to more than 59% in almost all the cases we tested. Furthermore, MeSH terms outperforms text words in ranking citations, especially when HITS was applied (p0.01).
机译:生物医学信息学面临的主要问题涉及如何最好地呈现信息检索结果。本文提出了一种向用户展示减少的相关引文集和主题标签的方法。设计了文本挖掘系统来对检索到的引文进行分组,对每个聚类中的引文进行排序,并生成一组关键字和MeSH术语来描述每个聚类的共同主题。;进行了一系列后续研究以提高性能系统的。提出了一种基于内容相似度网络技术的信息检索系统频谱分析聚类方法。新方法将所有这些搜索结果智能地分类。我们的实验结果表明,所提出的方法在现实应用中表现良好。;每个集群的自动概念识别是我们文本挖掘系统中的重要任务之一。该系统可以执行关键字,关键的MeSH术语和关键的名词短语提取。在每个聚类中,关键字和关键MeSH术语的提取基于将文档项矩阵建模为加权二部图。相互强化原则用于对术语进行排名。我们新的关键名词短语提取方法基于从输入文档中提取的无上下文语法规则。现有的称为Sequitur的算法用于构建上下文无关的语法规则,该规则将序列重新表示为层次结构。从语法规则中提取名词短语。从最高频率规则中识别出关键名词短语,而没有提取所有语法规则。实验结果表明,我们的关键名词短语提取方法可以有效地从文档中识别关键概念,并且优于目前广泛使用的方法。;我们还尝试使用现有的Web排名算法HITS(超链接诱导主题)对MEDLINE引用进行排名搜索)算法。我们进一步将HITS扩展到受监督的HITS,以对引文进行排名。我们的结果表明,监督型HITS算法明显优于HITS算法(p <0.01)。与HITS相比,在我们测试的几乎所有情况下,受监督的HITS都可以将引用排名从15%提高到59%以上。此外,在排序引用中,MeSH词优于文本词,尤其是在应用HITS时(p <0.01)。

著录项

  • 作者

    Lin, Yongjing.;

  • 作者单位

    The University of Texas at Dallas.;

  • 授予单位 The University of Texas at Dallas.;
  • 学科 Engineering Biomedical.;Computer Science.
  • 学位 Ph.D.
  • 年度 2008
  • 页码 179 p.
  • 总页数 179
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 康复医学;
  • 关键词

  • 入库时间 2022-08-17 11:39:08

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号