首页> 外文会议>ISWC 2010;International semantic web conference >Supporting Natural Language Processing with Background Knowledge: Coreference Resolution Case
【24h】

Supporting Natural Language Processing with Background Knowledge: Coreference Resolution Case

机译:通过背景知识支持自然语言处理:共指解决案例

获取原文

摘要

Systems based on statistical and machine learning methods have been shown to be extremely effective and scalable for the analysis of large amount of textual data. However, in the recent years, it becomes evident that one of the most important directions of improvement in natural language processing (NLP) tasks, like word sense disambiguation, coreference resolution, relation extraction, and other tasks related to knowledge extraction, is by exploiting semantics. While in the past, the unavailability of rich and complete semantic descriptions constituted a serious limitation of their applicability, nowadays, the Semantic Web made available a large amount of logically encoded information (e.g. ontologies, RDF(S)-data, linked data, etc.), which constitutes a valuable source of semantics. However, web semantics cannot be easily plugged into machine learning systems. Therefore the objective of this paper is to define a reference methodology for combining semantic information available in the web under the form of logical theories, with statistical methods for NLP. The major problems that we have to solve to implement our methodology concern (i) the selection of the correct and minimal knowledge among the large amount available in the web, (ii) the representation of uncertain knowledge, and (iii) the resolution and the encoding of the rules that combine knowledge retrieved from Semantic Web sources with semantics in the text. In order to evaluate the appropriateness of our approach, we present an application of the methodology to the problem of intra-document coreference resolution, and we show by means of some experiments on the standard dataset, how the injection of knowledge leads to the improvement of this task performance.
机译:已经证明,基于统计和机器学习方法的系统对于分析大量文本数据非常有效且可扩展。但是,近年来,很明显,自然语言处理(NLP)任务(如词义消歧,共指解析,关系提取以及与知识提取有关的其他任务)中最重要的改进方向之一是通过开发语义。过去,无法使用丰富而完整的语义描述严重限制了其适用性,如今,语义网提供了大量逻辑编码信息(例如本体,RDF(S)数据,链接数据等)。 。),这是语义的重要来源。但是,Web语义不能轻易插入到机器学习系统中。因此,本文的目的是定义一种参考方法,用于将逻辑理论形式的Web上可用的语义信息与NLP的统计方法相结合。为实施我们的方法论,我们需要解决的主要问题包括:(i)在网络上可用的大量知识中选择正确和最少的知识;(ii)不确定知识的表示形式;(iii)分辨率和规则的编码,将从语义Web源中检索到的知识与文本中的语义结合起来。为了评估我们方法的适当性,我们介绍了该方法在文档内共指分辨率问题上的应用,并通过在标准数据集上进行的一些实验证明了知识的注入如何导致知识水平的提高。这项任务的表现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号