首页> 外文学位 >Automated methods of auditing and using terminology/ontology knowledge bases for natural language processing.
【24h】

Automated methods of auditing and using terminology/ontology knowledge bases for natural language processing.

机译:自动化的审计方法以及使用术语/本体论知识库进行自然语言处理的方法。

获取原文
获取原文并翻译 | 示例

摘要

Due to our cognitive nature of communicating in natural language, narrative information plays a critical role in storing and disseminating knowledge. In a knowledge-intensive domain such as biomedicine, the overhead to digest huge amount of texts in clinical reports, research literature, and consumer websites, is extremely demanding. Biomedical natural language processing (BioNLP) is an informatics specialty that aims to automatically analyze and restructure biomedical text into more digestible size and format so that it can be easily post-processed by humans or other automated programs. In order to handle the comprehensive lexical and semantic knowledge in biomedicine, BioNLP systems need to incorporate domain-specific terminology/ontology knowledge bases. In addition, using standardized lexical/semantic entities will benefit the interoperability between BioNLP systems and associated applications. However, two major issues have been observed as hindering the optimal use of terminology/ontology for BioNLP: First, the existing terminology/ontology knowledge bases are not customized for NLP purposes and contain problematic contents; Second, automated solutions for improving and using the knowledge bases are still inadequate and therefore limiting their use in BioNLP.;To address the issues, corresponding solutions were proposed in the dissertation both to improve terminology/ontology for BioNLP purposes and to demonstrate feasibility of using terminology/ontology in BioNLP applications. For the first task, two automatic classifiers were developed to reclassify and audit semantic classification of terminology concepts. The classifiers use empirical language features and complement other auditing methods that apply ontological principles. For the second task, we developed unsupervised methods that use terminology/ontology for word sense disambiguation (WSD). The methods can help reduce the labor of manual annotation and sample representative evaluation instances for WSD research. Promising results have been achieved in both tasks and we have made the reclassified concepts a public database for the community. The results also enhanced our understanding about the biomedical terminology/ontology knowledge bases and pointed out interesting directions for future research. The methods by the dissertation can be generalized to other fields and should promote the use of standardized terminology/ontology in biomedicine and healthcare.
机译:由于我们以自然语言进行交流的认知性质,叙事信息在存储和传播知识中起着至关重要的作用。在诸如生物医学之类的知识密集型领域中,对临床报告,研究文献和消费者网站中大量文本进行消化的开销非常高。生物医学自然语言处理(BioNLP)是信息学专业,旨在自动将生物医学文本分析和重组为更易消化的大小和格式,以便可以由人类或其他自动化程序轻松对其进行后处理。为了处理生物医学中的全面词汇和语义知识,BioNLP系统需要合并特定领域的术语/本体论知识库。另外,使用标准化的词汇/语义实体将有益于BioNLP系统与相关应用程序之间的互操作性。但是,已经发现有两个主要问题阻碍了BioNLP术语/本体的最佳使用:首先,现有的术语/本体知识库不是为NLP定制的,并且包含有问题的内容;其次,用于改善和使用知识库的自动化解决方案仍然不足,因此限制了它们在BioNLP中的使用。为解决这些问题,本文提出了相应的解决方案,以改善BioNLP的术语/本体,并证明使用的可行性。 BioNLP应用程序中的术语/本体。对于第一个任务,开发了两个自动分类器以对术语概念的语义分类进行重新分类和审核。分类器使用经验语言功能,并补充适用本体论原理的其他审核方法。对于第二项任务,我们开发了使用术语/本体论进行单词歧义消除(WSD)的无监督方法。这些方法可以帮助减少用于WSD研究的人工注释和样本代表性评估实例的工作量。两项任务均取得了可喜的成果,我们将重新分类的概念作为社区的公共数据库。结果还增强了我们对生物医学术语/本体论知识库的理解,并指出了未来研究的有趣方向。本文的方法可以推广到其他领域,并应促进在生物医学和医疗保健领域中标准化术语/本体的使用。

著录项

  • 作者

    Fan, Jung-Wei.;

  • 作者单位

    Columbia University.;

  • 授予单位 Columbia University.;
  • 学科 Biology Bioinformatics.;Artificial Intelligence.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 120 p.
  • 总页数 120
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:37:59

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号