首页> 外文会议>ACM SIGMOD international conference on management of data >I4E: Interactive Investigation of Iterative Information Extraction
【24h】

I4E: Interactive Investigation of Iterative Information Extraction

机译:I4E:迭代信息提取的互动调查

获取原文

摘要

Information extraction systems are increasingly being used to mine structured information from unstructured text documents. A commonly used unsupervised technique is to build iterative information extraction (IIE) systems that learn task-specific rules, called patterns, to generate the desired tuples. Oftentimes, output from an information extraction system may contain unexpected results which may be due to an incorrect pattern, incorrect tuple, or both. In such scenarios, users and developers of the extraction system could greatly benefit from an investigation tool that can quickly help them reason about and repair the output. In this paper, we develop an approach for interactive post-extraction investigation for HE systems. We formalize three important phases of this investigation, namely, explain the HE result, diagnose the influential and problematic components, and repair the output from an information extraction system. We show how to characterize the execution of an IIE system and build a suite of algorithms to answer questions pertaining to each of these phases. We experimentally evaluate our proposed approach over several domains over a Web corpus of about 500 million documents. We show that our approach effectively enables post-extraction investigation, while maximizing the gain from user and developer interaction.
机译:信息提取系统越来越多地用于从非结构化文本文档中挖掘结构化信息。常用的无监督技术是构建迭代信息提取(IIE)系统,用于学习特定于任务的规则,称为模式,以生成所需的元组。通常,来自信息提取系统的输出可能包含意想不到的结果,这可能是由于不正确的模式,元组或两者。在这种情况下,提取系统的用户和开发人员可以从调查工具中大大受益,这些工具可以快速帮助他们推理和修复输出。在本文中,我们为他的系统开发了一种互动后提取调查的方法。我们正规化这一调查的三个重要阶段,即解释他的结果,诊断有影响力和有问题的组件,并从信息提取系统修复输出。我们展示了如何表征IIE系统的执行,并构建一套算法,以回答与这些阶段的每个阶段有关的问题。我们通过针对网络语料库进行大约500万件文件的众多域来进行实验评估我们的建议方法。我们表明,我们的方法有效地实现了提取后调查,同时最大化用户和开发人员交互的增益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号