首页> 外文OA文献 >Methods for open information extraction and sense disambiguation on natural language text
【2h】

Methods for open information extraction and sense disambiguation on natural language text

机译:对自然语言文本进行公开信息抽取和歧义消除的方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Natural language text has been the main and most comprehensive way of expressing and storing knowledge. A long standing goal in computer science is to develop systems that automatically understand textual data, making this knowledge accessible to computers and humans alike. We conceive automatic text understanding as a bottom-up approach, in which a series of interleaved tasks build upon each other. Each task achieves more understanding over the text than the previous one. In this regard, we present three methods that aim to contribute to the primary stages of this setting.Our first contribution, ClausIE, is an open information extraction method intended to recognize textual expressions of potential facts in text (e.g. “Dante wrote the Divine Comedy”) and represent them with an amenable structure for computers [(“Dante”, “wrote”, “the Divine Comedy”)]. Unlike previous approaches, ClausIE separates the recognition of the information from its representation, a process that understands the former as universal (i.e., domain-independent) and the later as application-dependent. ClausIE is a principled method that relies on properties of the English language and thereby avoids the use of manually or automatically generated training data.Once the information in text has been correctly identified, probably the most important element in a structured fact is the relation which links its arguments, a relation whose main component is usually a verbal phrase. Our second contribution, Werdy, is a word entry recognition and disambiguation method. It aims to recognize words or multi-word expressions (e.g., “Divine Comedy” is a multi-word expression) in a fact and disambiguate verbs (e.g., what does “write” mean?). Werdy is also an unsupervised approach, mainly relying on the syntactic and semantic relation established between a verb sense and its arguments.The other key components in a structured fact are the named entities (e.g., “Dante”) that often appear in the arguments. FINET, our last contribution, is a named entity typing method. It aims to understand the types or classes of those names entities (e.g., “Dante” refers to a writer). FINET is focused on typing named entities in short inputs (like facts). Unlike previous systems, it is designed to find the types that match the entity mention context (e.g., the fact in which it appears). It uses the most comprehensive type system of any entity typing method to date with more than 16k classes for persons, organizations and locations.These contributions are intended to constitute constructive building blocks for deeper understanding tasks in a bottom-up automatic text understanding setting.
机译:自然语言文本已经成为表达和存储知识的主要和最全面的方法。计算机科学的长期目标是开发能够自动理解文本数据的系统,从而使计算机和人类都可以访问此知识。我们将自动文本理解视为一种自下而上的方法,在该方法中,一系列相互交错的任务彼此建立。与上一个任务相比,每个任务对文本的理解都更多。在这方面,我们提出了三种有助于此设置主要阶段的方法。我们的第一项贡献,ClausIE是一种开放式信息提取方法,旨在识别文本中潜在事实的文本表达(例如,“但丁写了《神曲》 ”),并以一种适合计算机的结构来表示它们[(“ Dante”,“ wrote”,“ Divine Comedy”)]。与以前的方法不同,ClausIE将信息的识别与其表示分开,该过程将前者理解为通用(即,与域无关),而后者则理解为与应用程序有关。 ClausIE是一种有原则的方法,它依赖于英语的属性,从而避免了使用手动或自动生成的训练数据。一旦正确识别了文本中的信息,结构化事实中最重要的元素可能就是链接的关系它的参数,通常主要是口头短语的关系。我们的第二个贡献,Werdy,是单词输入识别和消歧方法。它旨在识别事实中的单词或多单词表达方式(例如,“ Divine Comedy”是多单词表达方式),并消除歧义动词(例如,“ write”是什么意思?)。 Werdy也是一种无监督的方法,主要依靠动词意义与其自变量之间建立的句法和语义关系。结构化事实中的其他关键组成部分是经常出现在自变量中的命名实体(例如“ Dante”)。 FINET,我们的最后一个贡献,是一种命名实体键入方法。它旨在了解那些名称实体的类型或类别(例如,“ Dante”是指作者)。 FINET专注于在简短输入(例如事实)中键入命名实体。与以前的系统不同,它旨在查找与实体提及上下文相匹配的类型(例如,它出现的事实)。它使用了迄今为止所有实体键入方法中最全面的类型系统,为人员,组织和位置提供了超过16k类。这些贡献旨在构成建设性的构建基块,以便在自下而上的自动文本理解设置中更深入地理解任务。

著录项

  • 作者

    Del Corro Luciano;

  • 作者单位
  • 年度 2015
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号