...
首页> 外文期刊>Information Processing & Management >Feature-enriched matrix factorization for relation extraction
【24h】

Feature-enriched matrix factorization for relation extraction

机译:特征提取的特征丰富矩阵分解

获取原文
获取原文并翻译 | 示例
           

摘要

Relation extraction aims at finding meaningful relationships between two named entities from within unstructured textual content. In this paper, we define the problem of information extraction as a matrix completion problem where we employ the notion of universal schemas formed as a collection of patterns derived from open information extraction systems as well as additional features derived from grammatical clause patterns and statistical topic models. One of the challenges with earlier work that employ matrix completion methods is that such approaches require a sufficient number of observed relation instances to be able to make predictions. However, in practice there is often insufficient number of explicit evidence supporting each relation type that could be used within the matrix model. Hence, existing work suffer from a low recall. In our work, we extend the work in the state of the art by proposing novel ways of integrating two sets of features, i.e., topic models and grammatical clause structures, for alleviating the low recall problem. More specifically, we propose that it is possible to (1) employ grammatical clause information from textual sentences to serve as an implicit indication of relation type and argument similarity. The basis for this is that it is likely that similar relation types and arguments are observed within similar grammatical structures, and (2) benefit from statistical topic models to determine similarity between relation types and arguments. We employ statistical topic models to determine relation type and argument similarity based on their co-occurrence within the same topics. We have performed extensive experiments based on both gold standard and silver standard datasets. The experiments show that our approach has been able to address the low recall problem in existing methods, by showing an improvement of 21% on recall and 8% on f-measure over the state of the art baseline.
机译:关系提取旨在从非结构化文本内容中找到两个命名实体之间的有意义关系。在本文中,我们将信息提取问题定义为矩阵完成问题,其中我们采用了通用模式的概念,该概念由开放式信息提取系统中的模式集合以及语法子句模式和统计主题模型中的其他特征构成。采用矩阵完成方法的早期工作的挑战之一是,这种方法需要足够数量的观察关系实例才能进行预测。但是,在实践中,通常没有足够多的明确证据支持可以在矩阵模型中使用的每种关系类型。因此,现有工作的召回率较低。在我们的工作中,我们提出了一种新颖的方式来集成最新的功能,以整合两组功能(即主题模型和语法从句结构),以缓解低召回率问题。更具体地说,我们建议有可能(1)利用文本句子中的语法从句信息作为关系类型和参数相似性的隐式指示。这样做的基础是,可能在相似的语法结构中观察到相似的关系类型和参数,并且(2)受益于统计主题模型来确定关系类型和参数之间的相似性。我们使用统计主题模型来确定关系类型和参数相似性,基于它们在同一主题中的共现。我们已经根据金标准和银标准数据集进行了广泛的实验。实验表明,我们的方法通过显示出比现有技术水平更高的召回率和f-measure值分别提高了21%和8%,从而能够解决现有方法中的低召回率问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号