首页> 外文学位 >A hybrid approach for ontology-based information extraction
【24h】

A hybrid approach for ontology-based information extraction

机译:基于本体的信息提取的混合方法

获取原文
获取原文并翻译 | 示例

摘要

Information extraction (IE) is the process of automatically transforming written natural language (i.e., text) into structured information, such as a knowledge base. However, because natural language is inherently ambiguous, this transformation process is highly complex. On the other hand, as Information Extraction moves from the analysis of scientific documents to the analysis of Internet textual content, we cannot rely completely on the assumption that the content of the text is correct. Indeed, in contrast to scientific documents, which are peer reviewed, Internet content is not verified for the quality and correctness.;Thus, two main issues that affect the IE process are the complexity of the extraction process and the quality of the data.;In this dissertation, we propose an improved ontology-based IE (OBIE) by providing solutions to these issues of accuracy and content quality. Based on a hybrid strategy that combines aspects of IE that are usually considered as opposite to each other, or that are not even considered, we intend to improve IE by developing a more accurate extraction and new functionality (semantic error detection). Our approach is based on OBIE, a sub-area of IE, which reduces extraction complexity by including domain knowledge, in the form of concepts and relationships of the domain, to guide the extraction process.;We address the complexity of extraction by combining information extractors that have different implementations. By integrating different types of implementation into one extraction system, we can produce a more accurate extraction. For each concept or relationship in the ontology, we can select the best implementation for extraction, or we can combine both implementations under an ensemble learning schema. In tandem, we address the quality of information by determining its semantic correctness with regard to domain knowledge. We define two methods for semantic error detection: by predefining the types of errors expected in the text or by applying logic reasoning to the text.;This dissertation includes both published and unpublished coauthored material.
机译:信息提取(IE)是将书面自然语言(即文本)自动转换为结构化信息(例如知识库)的过程。但是,由于自然语言固有地模棱两可,因此此转换过程非常复杂。另一方面,随着信息提取从对科学文献的分析转向对互联网文本内容的分析,我们不能完全依赖于文本内容正确的假设。确实,与经过同行评审的科学文献相比,互联网内容的质量和正确性并未得到验证。因此,影响IE过程的两个主要问题是提取过程的复杂性和数据质量。本文通过提供针对这些准确性和内容质量问题的解决方案,提出了一种改进的基于本体的IE(OBIE)。基于一种混合策略,该策略结合了通常被认为彼此相对或什至没有考虑到的IE方面,我们打算通过开发更准确的提取和新功能(语义错误检测)来改进IE。我们的方法基于IE的子区域OBIE,它通过以域的概念和关系的形式包含域知识来降低提取复杂性,以指导提取过程。;我们通过结合信息来解决提取的复杂性具有不同实现的提取器。通过将不同类型的实现集成到一个提取系统中,我们可以进行更准确的提取。对于本体中的每个概念或关系,我们可以选择最佳的实现方式进行提取,也可以在整体学习模式下将两种实现方式组合在一起。同时,我们通过确定关于领域知识的语义正确性来解决信息的质量。我们定义了两种语义错误检测方法:通过预定义文本中预期的错误类型或对文本应用逻辑推理。本论文包括已发表和未发表的合著材料。

著录项

  • 作者

    Gutierrez, Fernando.;

  • 作者单位

    University of Oregon.;

  • 授予单位 University of Oregon.;
  • 学科 Computer science.;Artificial intelligence.
  • 学位 Ph.D.
  • 年度 2015
  • 页码 132 p.
  • 总页数 132
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号