首页> 外文会议>International world wide web conference;WWW 09 >StatSnowball: a Statistical Approach to Extracting Entity Relationships
【24h】

StatSnowball: a Statistical Approach to Extracting Entity Relationships

机译:StatSnowball:一种提取实体关系的统计方法

获取原文

摘要

Traditional relation extraction methods require pre-specified relations and relation-specific human-tagged examples. Bootstrapping systems significantly reduce the number of training examples, but they usually apply heuristic-based methods to combine a set of strict hard rules, which limit the ability to generalize and thus generate a low recall. Furthermore, existing bootstrapping methods do not perform open information extraction (Open IE), which can identify various types of relations without requiring pre-specifications. In this paper, we propose a statistical extraction framework called Statistical Snowball (StatSnowball), which is a bootstrapping system and can perform both traditional relation extraction and Open IE.StatSnowball uses the discriminative Markov logic networks (MLNs) and softens hard rules by learning their weights in a maximum likelihood estimate sense. MLN is a general model, and can be configured to perform different levels of relation extraction. In StatSnwoball, pattern selection is performed by solving an e_1-norm penalized maximum likelihood estimation, which enjoys well-founded theories and efficient solvers. We extensively evaluate the performance of StatSnowball in different configurations on both a small but fully labeled data set and large-scale Web data. Empirical results show that StatSnowball can achieve a significantly higher recall without sacrificing the high precision during iterations with a small number of seeds, and the joint inference of MLN can improve the performance. Finally, StatSnowball is efficient and we have developed a working entity relation search engine called Renlifang based on it.
机译:传统的关系提取方法需要预先指定的关系和特定于关系的带有人类标签的示例。自举系统可显着减少训练示例的数量,但是它们通常采用基于启发式的方法来结合一组严格的硬规则,这限制了泛化能力,因而产生了较低的召回率。此外,现有的引导方法不执行开放信息提取(Open IE),后者可以识别各种类型的关系而无需预先指定。在本文中,我们提出了一个称为统计雪球(StatSnowball)的统计提取框架,该框架是一个引导系统,可以执行传统的关系提取和Open IE。 StatSnowball使用可区分的马尔可夫逻辑网络(MLN),并通过在最大似然估计意义上学习权重来软化硬性规则。 MLN是通用模型,可以配置为执行不同级别的关系提取。在StatSnwoball中,模式选择是通过求解e_1范式惩罚的最大似然估计来执行的,该方法具有公认的理论和有效的求解器。我们广泛评估StatSnowball在小型但完全标记的数据集和大规模Web数据上的不同配置下的性能。实验结果表明,StatSnowball可以在不减少种子数量的情况下实现较高的召回率,而不会牺牲高精度,而MLN的联合推理可以提高性能。最后,StatSnowball是高效的,我们在此基础上开发了一个名为Renlifang的工作实体关系搜索引擎。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号