【24h】

Towards Efficient Named-Entity Rule Induction for Customizability

机译:寻求可命名的有效命名实体规则归纳

获取原文

摘要

Generic rule-based systems for Information Extraction (IE) have been shown to work reasonably well out-of-the-box, and achieve state-of-the-art accuracy with further domain customization. However, it is generally recognized that manually building and customizing rules is a complex and labor intensive process. In this paper, we discuss an approach that facilitates the process of building customizable rules for Named-Entity Recognition (NER) tasks via rule induction, in the Annotation Query Language (AQL). Given a set of basic features and an annotated document collection, our goal is to generate an initial set of rules with reasonable accuracy, that are in-terpretable and thus can be easily refined by a human developer. We present an efficient rule induction process, modeled on a four-stage manual rule development process and present initial promising results with our system. We also propose a simple notion of extractor complexity as a first step to quantify the interpretability of an extractor, and study the effect of induction bias and customization of basic features on the accuracy and complexity of induced rules. We demonstrate through experiments that the induced rules have good accuracy and low complexity according to our complexity measure.
机译:用于信息提取的通用规则的系统(即)已被证明是合理的开箱即用的工作,并通过进一步的域定制实现最先进的准确性。然而,普遍认为手动建设和定制规则是复杂和劳动密集型的过程。在本文中,我们讨论了一种方法,这促进了通过规则归纳语言(AQL)通过规则诱导构建命名实体识别(ner)任务的可定制规则的过程。鉴于一组基本功能和注释的文档集合,我们的目标是以合理的准确度生成一组初始规则,这是可替补的,因此可以通过人类开发人员容易地改装。我们提出了一个有效的规则感应过程,在四阶段手动规则开发过程中建模,并与我们的系统一起出现初步有前途的结果。我们还提出了一种简单的提取器复杂性的概念作为量化提取器的可解释性的第一步,并研究感应偏差和定制基本特征对诱导规则的准确性和复杂性的影响。我们通过实验证明了诱导规则根据我们的复杂度措施具有良好的准确性和低复杂性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号