首页> 外文OA文献 >Methods for open information extraction and sense disambiguation on natural language text

【2h】

Methods for open information extraction and sense disambiguation on natural language text

机译：对自然语言文本进行公开信息抽取和歧义消除的方法

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Natural language text has been the main and most comprehensive way of expressing and storing knowledge. A long standing goal in computer science is to develop systems that automatically understand textual data, making this knowledge accessible to computers and humans alike. We conceive automatic text understanding as a bottom-up approach, in which a series of interleaved tasks build upon each other. Each task achieves more understanding over the text than the previous one. In this regard, we present three methods that aim to contribute to the primary stages of this setting.Our first contribution, ClausIE, is an open information extraction method intended to recognize textual expressions of potential facts in text (e.g. “Dante wrote the Divine Comedy”) and represent them with an amenable structure for computers [(“Dante”, “wrote”, “the Divine Comedy”)]. Unlike previous approaches, ClausIE separates the recognition of the information from its representation, a process that understands the former as universal (i.e., domain-independent) and the later as application-dependent. ClausIE is a principled method that relies on properties of the English language and thereby avoids the use of manually or automatically generated training data.Once the information in text has been correctly identified, probably the most important element in a structured fact is the relation which links its arguments, a relation whose main component is usually a verbal phrase. Our second contribution, Werdy, is a word entry recognition and disambiguation method. It aims to recognize words or multi-word expressions (e.g., “Divine Comedy” is a multi-word expression) in a fact and disambiguate verbs (e.g., what does “write” mean?). Werdy is also an unsupervised approach, mainly relying on the syntactic and semantic relation established between a verb sense and its arguments.The other key components in a structured fact are the named entities (e.g., “Dante”) that often appear in the arguments. FINET, our last contribution, is a named entity typing method. It aims to understand the types or classes of those names entities (e.g., “Dante” refers to a writer). FINET is focused on typing named entities in short inputs (like facts). Unlike previous systems, it is designed to find the types that match the entity mention context (e.g., the fact in which it appears). It uses the most comprehensive type system of any entity typing method to date with more than 16k classes for persons, organizations and locations.These contributions are intended to constitute constructive building blocks for deeper understanding tasks in a bottom-up automatic text understanding setting.

机译：自然语言文本已经成为表达和存储知识的主要和最全面的方法。计算机科学的长期目标是开发能够自动理解文本数据的系统，从而使计算机和人类都可以访问此知识。我们将自动文本理解视为一种自下而上的方法，在该方法中，一系列相互交错的任务彼此建立。与上一个任务相比，每个任务对文本的理解都更多。在这方面，我们提出了三种有助于此设置主要阶段的方法。我们的第一项贡献，ClausIE是一种开放式信息提取方法，旨在识别文本中潜在事实的文本表达（例如，“但丁写了《神曲》 ”），并以一种适合计算机的结构来表示它们[（“ Dante”，“ wrote”，“ Divine Comedy”）]。与以前的方法不同，ClausIE将信息的识别与其表示分开，该过程将前者理解为通用（即，与域无关），而后者则理解为与应用程序有关。 ClausIE是一种有原则的方法，它依赖于英语的属性，从而避免了使用手动或自动生成的训练数据。一旦正确识别了文本中的信息，结构化事实中最重要的元素可能就是链接的关系它的参数，通常主要是口头短语的关系。我们的第二个贡献，Werdy，是单词输入识别和消歧方法。它旨在识别事实中的单词或多单词表达方式（例如，“ Divine Comedy”是多单词表达方式），并消除歧义动词（例如，“ write”是什么意思？）。 Werdy也是一种无监督的方法，主要依靠动词意义与其自变量之间建立的句法和语义关系。结构化事实中的其他关键组成部分是经常出现在自变量中的命名实体（例如“ Dante”）。 FINET，我们的最后一个贡献，是一种命名实体键入方法。它旨在了解那些名称实体的类型或类别（例如，“ Dante”是指作者）。 FINET专注于在简短输入（例如事实）中键入命名实体。与以前的系统不同，它旨在查找与实体提及上下文相匹配的类型（例如，它出现的事实）。它使用了迄今为止所有实体键入方法中最全面的类型系统，为人员，组织和位置提供了超过16k类。这些贡献旨在构成建设性的构建基块，以便在自下而上的自动文本理解设置中更深入地理解任务。

著录项

作者
Del Corro Luciano;
展开▼
作者单位

展开▼
年度 2015
总页数
原文格式 PDF
正文语种 eng
中图分类

相似文献

外文文献
中文文献
专利

1. A Survey of Word-sense Disambiguation Effective Techniques and Methods for Indian Languages [J] . Shallu, Vishal Gupta Journal of Emerging Technologies in Web Intelligence . 2013,第4期

机译：印度语单词歧义消除有效技术和方法的调查
2. New Techniques for Disambiguation in Natural Language and Their Application to Biological Text [J] . Ginter Filip, Boberg Jorma, J?¤rvinen Jouni, Journal of machine learning research . 2004,第Jun期

机译：自然语言歧义消除新技术及其在生物文本中的应用
3. Bioinformatic Workflow Extraction from Scientific Texts based on Word Sense Disambiguation [J] . Ahmed Halioui, Petko Valtchev, Abdoulaye Baniré Diallo IEEE/ACM transactions on computational biology and bioinformatics . 2018,第6期

机译：基于词义消歧的科学文本生物信息工作流提取
4. Syntactic analyzer using morphological process for a given text in natural language for Sense Disambiguation [C] . Dhopavkar Gauri, Kshirsagar Manali 2014 5th International Conference- Confluence The Next Generation Information Technology Summit . 2014

机译：语法分析器使用自然语言中的给定文本进行形态学处理以消除歧义
5. Entity Extraction and Disambiguation in Short Text Using Wikipedia and Semantic User Profiles. [D] . Zendejas, Ignacio. 2014

机译：使用Wikipedia和语义用户配置文件在短文本中提取和消除歧义。
6. Studying the correlation between different word sense disambiguation methods and summarization effectiveness in biomedical texts [O] . Laura Plaza, Antonio J Jimeno-Yepes, Alberto Díaz, 2011

机译：研究生物医学文本中不同词义消歧方法与摘要效果之间的相关性
7. A Word Sense Disambiguation Approach for Converting Natural Language Text into a Common Semantic Description [O] . Francisco Tacoa, Hiroshi Uchida, Mitsuru Ishizuka 2010

机译：一种将自然语言文本转换为通用语义描述的词义消歧方法

Methods for open information extraction and sense disambiguation on natural language text

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅