首页> 外文OA文献 >Facilitating knowledge discovery by integrating bottom-up and top-down knowledge sources: A text mining approach
【2h】

Facilitating knowledge discovery by integrating bottom-up and top-down knowledge sources: A text mining approach

机译:通过整合自下而上和自上而下的知识源来促进知识发现:一种文本挖掘方法

摘要

This dissertation aims to discover synergistic combinations of top-down (ontologies), interactive (relevance feedback), and bottom-up (machine learning) knowledge encoding techniques for text mining. The strength of machine learning techniques lies in their coverage and efficiency because they can discover new knowledge without human intervention. The output, however, is often imprecise and irrelevant. Human knowledge, top-down or interactively encoded, may remedy this. The research question addressed is if knowledge discovery can become more precise and relevant with hybrid systems. Three different combinations are evaluated. The first study investigates an ontology, the Unified Medical Language System (UMLS), combined with an automatically created thesaurus to dynamically adjust the thesaurus' output. The augmented thesaurus was added to a medical, meta-search portal as a keyword suggester and compared with the unmodified thesaurus and UMLS. Users preferred the hybrid approach. Thus, the combination of the ontology with the thesaurus was better than the components separately. The second study investigates implicit relevance feedback combined with genetic algorithms designed to adjust user queries for online searching. These were compared with pure relevance feedback algorithms. Users were divided into groups based on their overall performance. The genetic algorithm significantly helped low achievers, but hindered high achievers. Thus, the interactively elicited knowledge from relevance feedback was judged insufficient to guide machine learning for all users. The final study investigates ontologies combined with two natural language processing techniques: a shallow parser and an automatically created thesaurus. Both capture relations between phrases in biomedical text. Qualified researchers found all terms to be precise; however, terms that belonged to ontologies were more relevant. Parser relations were all precise. Thesaurus relations were less precise, but precision improved for relations that had their terms represented in ontologies. Thus, this integration of ontologies with natural language processing provided good results. In general, it was concluded that top-down encoded knowledge could be effectively integrated with bottom-up encoded knowledge for knowledge discovery in text. This is particularly relevant to business fields, which are text and knowledge intensive. In the future, it will be worthwhile to extend the parser and also to test similar hybrid approaches for data mining.
机译:本文旨在发现文本挖掘的自上而下(本体),交互(相关反馈)和自下而上(机器学习)知识编码技术的协同组合。机器学习技术的优势在于其覆盖范围和效率,因为它们无需人工干预即可发现新知识。但是,输出通常是不准确且不相关的。自上而下或交互式编码的人类知识可能会对此进行补救。解决的研究问题是知识发现是否可以变得更加精确并与混合系统相关。评估了三种不同的组合。第一项研究调查了一种本体,即统一医学语言系统(UMLS),该本体与自动创建的同义词库结合使用以动态调整同义词库的输出。增强的词库被添加到医学元搜索门户中,作为关键字提示,并与未修改的词库和UMLS进行了比较。用户更喜欢混合方法。因此,本体与词库的组合比单独的组件要好。第二项研究调查了隐式相关反馈,并结合了旨在调整用户查询以进行在线搜索的遗传算法。将这些与纯相关反馈算法进行了比较。根据用户的整体表现将其分为几组。遗传算法极大地帮助了低成就者,但阻碍了高成就者。因此,从相关性反馈中交互式得出的知识被认为不足以指导所有用户进行机器学习。最终研究对本体与两种自然语言处理技术的结合进行了研究:浅解析器和自动创建的同义词库。两者都捕获了生物医学文本中短语之间的关系。合格的研究人员发现所有术语都是精确的。但是,属于本体的术语更为相关。解析器关系都很精确。词库关系不太精确,但是对于以术语在本体中表示的关系,精确度有所提高。因此,本体与自然语言处理的这种集成提供了良好的结果。总的来说,可以得出结论,自上而下的编码知识可以与自下而上的编码知识有效集成,以进行文本中的知识发现。这与文本和知识密集的业务领域特别相关。将来,扩展解析器并测试类似的混合方法进行数据挖掘将是值得的。

著录项

  • 作者

    Leroy Gondy A.;

  • 作者单位
  • 年度 2003
  • 总页数
  • 原文格式 PDF
  • 正文语种 en_US
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号