首页> 外文会议>International Conference on Computational Linguistics >Cascading Use of Soft and Hard Matching Pattern Rules for Weakly Supervised Information Extraction
【24h】

Cascading Use of Soft and Hard Matching Pattern Rules for Weakly Supervised Information Extraction

机译:用于弱监督信息提取的软和硬匹配模式规则的级联使用

获取原文

摘要

Current rule induction techniques based on hard matching (i.e., strict slot-by-slot matching) tend to fare poorly in extracting information from natural language texts, which often exhibit great variations. The reason is that hard matching techniques result in relatively high precision but low recall. To tackle this problem, we take advantage of the newly proposed soft pattern rules which offer high recall through the use of probabilistic matching. We propose a bootstrapping framework in which soft and hard matching pattern rules are combined in a cascading manner to realize a weakly supervised rule induction scheme. The system starts with a small set of hand-tagged instances. At each iteration, we first generate soft pattern rules and utilize them to tag new training instances automatically. We then apply hard pattern rule induction on the overall tagged data to generate more precise rules, which are used to tag the data again. The process can be repeated until satisfactory results are obtained. Our experimental results show that our bootstrapping scheme with two cascaded learners approaches the performance of a fully supervised information extraction system while using much fewer hand-tagged instances.
机译:基于硬质匹配的电流规则感应技术(即,严格的时隙匹配)倾向于在从自然语言文本中提取信息中的信息较差,这通常表现出很大的变化。原因是硬匹配技术导致相对高的精度但低召回。为了解决这个问题,我们利用了通过使用概率匹配来提供高召回的新提出的软模式规则。我们提出了一种引导框架,其中软匹配模式规则以级联方式组合以实现弱监督的规则感应方案。系统从一小部分手动标记的实例开始。在每次迭代时,我们首先生成软模式规则,并利用它们自动标记新的培训实例。然后,我们在总标记数据上应用硬模式规则诱导,以生成更精确的规则,用于再次标记数据。可以重复该过程直到获得令人满意的结果。我们的实验结果表明,我们具有两个级联学习者的自动启动方案在使用更少的手动标记实例时接近完全监督信息提取系统的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号