Text mining biomedical literature for improving MEDLINE retrieval.

机译：文本挖掘生物医学文献，以改善MEDLINE检索。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

A major problem faced in biomedical informatics involves how best to present information retrieval results. This dissertation developed an approach that present users with reduced sets of relevant citations together with topic label. A text mining system is designed to group the retrieved citations, rank the citations in each cluster, and generate a set of keywords and MeSH terms to describe the common theme of each cluster.;A series of follow-up researches were conducted for better performance of the system. A spectral analysis clustering method was proposed based on the content similarity network techniques for information retrieval systems. The new approach organizes all these search results into categories intelligently. Our experimental results demonstrated that the presented method performs well in real world applications.;Automated concept recognition for each cluster is one of the important tasks in our text mining system. The system can perform keyword, key MeSH term and key noun-phrase extraction. Within each cluster, the extraction of keyword and key MeSH term is based on modeling the document-term-matrix as a weighted bipartite graph. A mutual reinforcement principle is used to rank the terms. Our new key noun-phrase extraction method is based on the context-free grammatical rules extracted from the input documents. An existing algorithm called Sequitur is used for constructing the context-free grammar rules that re-represent a sequence as a hierarchical structure. Noun-phrases are extracted from the grammatical rules. The key noun-phrases were identified from top frequency rules without extracting all the grammatical rules. The experimental results showed that our key noun-phrase extraction method is effective in identifying key concepts from documents, and outperforms current widely-used methods.;We also explored to rank MEDLINE citations using an existing web ranking algorithm, HITS (Hyperlink-Induced Topic Search) algorithm. We further extended HITS to supervised HITS to rank citations. Our results showed that supervised HITS algorithm significantly outperforms HITS algorithm (p0.01). Compared with HITS, supervised HITS can improve citation ranking from 15% to more than 59% in almost all the cases we tested. Furthermore, MeSH terms outperforms text words in ranking citations, especially when HITS was applied (p0.01).

机译：生物医学信息学面临的主要问题涉及如何最好地呈现信息检索结果。本文提出了一种向用户展示减少的相关引文集和主题标签的方法。设计了文本挖掘系统来对检索到的引文进行分组，对每个聚类中的引文进行排序，并生成一组关键字和MeSH术语来描述每个聚类的共同主题。;进行了一系列后续研究以提高性能系统的。提出了一种基于内容相似度网络技术的信息检索系统频谱分析聚类方法。新方法将所有这些搜索结果智能地分类。我们的实验结果表明，所提出的方法在现实应用中表现良好。；每个集群的自动概念识别是我们文本挖掘系统中的重要任务之一。该系统可以执行关键字，关键的MeSH术语和关键的名词短语提取。在每个聚类中，关键字和关键MeSH术语的提取基于将文档项矩阵建模为加权二部图。相互强化原则用于对术语进行排名。我们新的关键名词短语提取方法基于从输入文档中提取的无上下文语法规则。现有的称为Sequitur的算法用于构建上下文无关的语法规则，该规则将序列重新表示为层次结构。从语法规则中提取名词短语。从最高频率规则中识别出关键名词短语，而没有提取所有语法规则。实验结果表明，我们的关键名词短语提取方法可以有效地从文档中识别关键概念，并且优于目前广泛使用的方法。;我们还尝试使用现有的Web排名算法HITS（超链接诱导主题）对MEDLINE引用进行排名搜索）算法。我们进一步将HITS扩展到受监督的HITS，以对引文进行排名。我们的结果表明，监督型HITS算法明显优于HITS算法（p <0.01）。与HITS相比，在我们测试的几乎所有情况下，受监督的HITS都可以将引用排名从15％提高到59％以上。此外，在排序引用中，MeSH词优于文本词，尤其是在应用HITS时（p <0.01）。

著录项

作者
Lin, Yongjing.;
展开▼
作者单位

The University of Texas at Dallas.;

展开▼
授予单位 The University of Texas at Dallas.;
学科 Engineering Biomedical.;Computer Science.
学位 Ph.D.
年度 2008
页码 179 p.
总页数 179
原文格式 PDF
正文语种 eng
中图分类康复医学;
关键词
入库时间 2022-08-17 11:39:08

相似文献

外文文献
中文文献
专利

1. Improving links between literature and biological data with text mining: a case study with GEO, PDB and MEDLINE [J] . Aurélie Névéol, W. John Wilbur, Zhiyong Lu Database . 2012,第40期

机译：通过文本挖掘改善文献与生物学数据之间的联系：GEO，PDB和MEDLINE的案例研究
2. BioReader: a text mining tool for performing classification of biomedical literature [J] . Christian Simon, Kristian Davidsen, Christina Hansen, BMC Bioinformatics . 2019,第S13期

机译：Bioreader：用于执行生物医学文献分类的文本挖掘工具
3. Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature [J] . H.-M. Müller, K. M. Van Auken, Y. Li, BMC Bioinformatics . 2018,第1期

机译：TextingSo Central：用于搜索，文本挖掘，观看和策划生物医学文学的可定制平台
4. Analysis of Protein/Protein Interactions Through Biomedical Literature: Text Mining of Abstracts vs. Text Mining of Full Text Articles [C] . Eric P.G. Martin, Eric G. Bremer, Marie-Claude Guerin, International Symposium on Knowledge Exploration in Life Science Informatics(KELSI 2004); 20041125-26; Milan(IT) . 2004

机译：通过生物医学文献分析蛋白质/蛋白质相互作用：摘要的文本挖掘与全文文章的文本挖掘
5. Text Mining of Mutations and Their Impact from Biomedical Literature [D] . Mahmood, A. S. M. Ashique 2018

机译：基因突变的文本挖掘及其对生物医学文献的影响
6. Improving links between literature and biological data with text mining: a case study with GEO PDB and MEDLINE [O] . Aurélie Névéol, W. John Wilbur, Zhiyong Lu 2012

机译：通过文本挖掘改善文献与生物学数据之间的联系：GEOPDB和MEDLINE的案例研究
7. Improving links between literature and biological data with text mining: a case study with GEO, PDB and MEDLINE [O] . Névéol, Aurélie, Wilbur, W. John, Lu, Zhiyong 2012

机译：通过文本挖掘改善文献与生物学数据之间的联系：GEO，PDB和MEDLINE的案例研究
8. Text Mining the Biomedical Literature. [R] . Kostoff, R. N. 2007

机译：文本挖掘生物医学文献。

Text mining biomedical literature for improving MEDLINE retrieval.

摘要

著录项

相似文献

相关主题

期刊订阅