首页> 外文期刊>Artificial intelligence >Analysis of a probabilistic model of redundancy in unsupervised information extraction
【24h】

Analysis of a probabilistic model of redundancy in unsupervised information extraction

机译:无监督信息抽取中冗余概率模型的分析

获取原文
获取原文并翻译 | 示例

摘要

Unsupervised Information Extraction (UIE) is the task of extracting knowledge from text without the use of hand-labeled training examples. Because UIE systems do not require human intervention, they can recursively discover new relations, attributes, and instances in a scalable manner. When applied to massive corpora such as the Web, UIE systems present an approach to a primary challenge in artificial intelligence: the automatic accumulation of massive bodies of knowledge.rnA fundamental problem for a UIE system is assessing the probability that its extracted information is correct. In massive corpora such as the Web, the same extraction is found repeatedly in different documents. How does this redundancy impact the probability of correctness?rnWe present a combinatorial "balls-and-urns" model, called Urns, that computes the impact of sample size, redundancy, and corroboration from multiple distinct extraction rules on the probability that an extraction is correct. We describe methods for estimating Urns's parameters in practice and demonstrate experimentally that for UIE the model's log likelihoods are 15 times better, on average, than those obtained by methods used in previous work. We illustrate the generality of the redundancy model by detailing multiple applications beyond UIE in which Urns has been effective. We also provide a theoretical foundation for Urns's performance, including a theorem showing that PAC Learnability in Urns is guaranteed without hand-labeled data, under certain assumptions.
机译:无监督信息提取(UIE)是从文本中提取知识的任务,而无需使用带有手工标记的培训示例。由于UIE系统不需要人工干预,因此可以以可伸缩的方式递归地发现新的关系,属性和实例。当应用于大规模的语料库(例如Web)时,UIE系统提出了一种对人工智能的主要挑战的方法:大量知识的自动积累。UIE系统的一个基本问题是评估其提取的信息正确的可能性。在诸如Web之类的大型语料库中,相同的提取在不同的文档中反复出现。该冗余度如何影响正确性的概率?rn我们提出了一个组合的“球和骨灰盒”模型,称为骨灰盒,该模型从多个不同的提取规则计算样本大小,冗余度和确证度对提取概率的影响正确。我们描述了在实践中估计Urns参数的方法,并通过实验证明,对于UIE,该模型的对数可能性平均比以前工作中使用的方法高15倍。我们通过详细介绍Urns有效的UIE以外的多个应用程序来说明冗余模型的一般性。我们还为缸的性能提供了理论基础,包括一个定理,表明在某些假设下,缸中的PAC可学习性在没有手工标记的数据的情况下得到保证。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号