首页> 外文OA文献 >Facilitating knowledge discovery by integrating bottom-up and top-down knowledge sources: A text mining approach

【2h】

Facilitating knowledge discovery by integrating bottom-up and top-down knowledge sources: A text mining approach

机译：通过整合自下而上和自上而下的知识源来促进知识发现：一种文本挖掘方法

页面导航

摘要
著录项
相似文献
相关主题

摘要

This dissertation aims to discover synergistic combinations of top-down (ontologies), interactive (relevance feedback), and bottom-up (machine learning) knowledge encoding techniques for text mining. The strength of machine learning techniques lies in their coverage and efficiency because they can discover new knowledge without human intervention. The output, however, is often imprecise and irrelevant. Human knowledge, top-down or interactively encoded, may remedy this. The research question addressed is if knowledge discovery can become more precise and relevant with hybrid systems. Three different combinations are evaluated. The first study investigates an ontology, the Unified Medical Language System (UMLS), combined with an automatically created thesaurus to dynamically adjust the thesaurus' output. The augmented thesaurus was added to a medical, meta-search portal as a keyword suggester and compared with the unmodified thesaurus and UMLS. Users preferred the hybrid approach. Thus, the combination of the ontology with the thesaurus was better than the components separately. The second study investigates implicit relevance feedback combined with genetic algorithms designed to adjust user queries for online searching. These were compared with pure relevance feedback algorithms. Users were divided into groups based on their overall performance. The genetic algorithm significantly helped low achievers, but hindered high achievers. Thus, the interactively elicited knowledge from relevance feedback was judged insufficient to guide machine learning for all users. The final study investigates ontologies combined with two natural language processing techniques: a shallow parser and an automatically created thesaurus. Both capture relations between phrases in biomedical text. Qualified researchers found all terms to be precise; however, terms that belonged to ontologies were more relevant. Parser relations were all precise. Thesaurus relations were less precise, but precision improved for relations that had their terms represented in ontologies. Thus, this integration of ontologies with natural language processing provided good results. In general, it was concluded that top-down encoded knowledge could be effectively integrated with bottom-up encoded knowledge for knowledge discovery in text. This is particularly relevant to business fields, which are text and knowledge intensive. In the future, it will be worthwhile to extend the parser and also to test similar hybrid approaches for data mining.

机译：本文旨在发现文本挖掘的自上而下（本体），交互（相关反馈）和自下而上（机器学习）知识编码技术的协同组合。机器学习技术的优势在于其覆盖范围和效率，因为它们无需人工干预即可发现新知识。但是，输出通常是不准确且不相关的。自上而下或交互式编码的人类知识可能会对此进行补救。解决的研究问题是知识发现是否可以变得更加精确并与混合系统相关。评估了三种不同的组合。第一项研究调查了一种本体，即统一医学语言系统（UMLS），该本体与自动创建的同义词库结合使用以动态调整同义词库的输出。增强的词库被添加到医学元搜索门户中，作为关键字提示，并与未修改的词库和UMLS进行了比较。用户更喜欢混合方法。因此，本体与词库的组合比单独的组件要好。第二项研究调查了隐式相关反馈，并结合了旨在调整用户查询以进行在线搜索的遗传算法。将这些与纯相关反馈算法进行了比较。根据用户的整体表现将其分为几组。遗传算法极大地帮助了低成就者，但阻碍了高成就者。因此，从相关性反馈中交互式得出的知识被认为不足以指导所有用户进行机器学习。最终研究对本体与两种自然语言处理技术的结合进行了研究：浅解析器和自动创建的同义词库。两者都捕获了生物医学文本中短语之间的关系。合格的研究人员发现所有术语都是精确的。但是，属于本体的术语更为相关。解析器关系都很精确。词库关系不太精确，但是对于以术语在本体中表示的关系，精确度有所提高。因此，本体与自然语言处理的这种集成提供了良好的结果。总的来说，可以得出结论，自上而下的编码知识可以与自下而上的编码知识有效集成，以进行文本中的知识发现。这与文本和知识密集的业务领域特别相关。将来，扩展解析器并测试类似的混合方法进行数据挖掘将是值得的。

著录项

作者
Leroy Gondy A.;
展开▼
作者单位

展开▼
年度 2003
总页数
原文格式 PDF
正文语种 en_US
中图分类

相似文献

外文文献
中文文献
专利

1. An information fusion approach to integrate image annotation and text mining methods for geographic knowledge discovery [J] . Chung-Hong Lee, Shih-Hao Wang Expert Systems with Application . 2012,第10期

机译：一种集成图像标注和文本挖掘方法的地理信息发现信息融合方法
2. Combining bottom-up and top-down approaches for knowledge discovery Comment on "Towards a unified approach in the modeling of fibrosis: A review with research perspectives" by Martine Ben Amar and Carlo Bianca [J] . Chiacchio Ferdinando, Motta Santo Physics of life reviews . 2016,第Null期

机译：将自下而上和自上而下的方法相结合来进行知识发现Martine Ben Amar和Carlo Bianca评述了“在纤维化建模中采用统一方法：具有研究视角的评论”
3. A structured approach to explore knowledge flows through technology-based business methods by integrating patent citation analysis and text mining [J] . No Hyun Joung, An Yoonjung, Park Yongtae Technological forecasting and social change . 2015,第auga期

机译：通过整合专利引用分析和文本挖掘，通过基于技术的业务方法探索知识流的结构化方法
4. Knowledge Discovery in Academic Registrar Data Bases using Source Mining: Data and Text [C] . Ma. Teresa Rios-Quezada, Francisco J. Cantu-Ortiz 12th Americas Conference on Information Systems(AMCIS 2006) vol.3 . 2006

机译：使用源挖掘的学术注册商数据库中的知识发现：数据和文本
5. Facilitating knowledge discovery by integrating bottom-up and top-down knowledge sources: A text mining approach. [D] . Leroy, Gondy A. 2003

机译：通过整合自下而上和自上而下的知识源来促进知识发现：一种文本挖掘方法。
6. Creating a Thesaurus from Text: A Bottom-Up Approach to Organizing Medical Knowledge [O] . Stuart J. Nelson, Thom Kuhn, Daniel Radzinski, 1998

机译：从文本创建同义词库：一种自下而上的组织医学知识的方法
7. Text Mining to Facilitate Domain Knowledge Discovery [O] . Chengbin Wang, Xiaogang Ma 2020

机译：文本挖掘，以方便域名知识发现

Facilitating knowledge discovery by integrating bottom-up and top-down knowledge sources: A text mining approach

摘要

著录项

相似文献

相关主题

期刊订阅