Towards Efficient Named-Entity Rule Induction for Customizability

机译：寻求可命名的有效命名实体规则归纳

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Generic rule-based systems for Information Extraction (IE) have been shown to work reasonably well out-of-the-box, and achieve state-of-the-art accuracy with further domain customization. However, it is generally recognized that manually building and customizing rules is a complex and labor intensive process. In this paper, we discuss an approach that facilitates the process of building customizable rules for Named-Entity Recognition (NER) tasks via rule induction, in the Annotation Query Language (AQL). Given a set of basic features and an annotated document collection, our goal is to generate an initial set of rules with reasonable accuracy, that are in-terpretable and thus can be easily refined by a human developer. We present an efficient rule induction process, modeled on a four-stage manual rule development process and present initial promising results with our system. We also propose a simple notion of extractor complexity as a first step to quantify the interpretability of an extractor, and study the effect of induction bias and customization of basic features on the accuracy and complexity of induced rules. We demonstrate through experiments that the induced rules have good accuracy and low complexity according to our complexity measure.

机译：用于信息提取的通用规则的系统（即）已被证明是合理的开箱即用的工作，并通过进一步的域定制实现最先进的准确性。然而，普遍认为手动建设和定制规则是复杂和劳动密集型的过程。在本文中，我们讨论了一种方法，这促进了通过规则归纳语言（AQL）通过规则诱导构建命名实体识别（ner）任务的可定制规则的过程。鉴于一组基本功能和注释的文档集合，我们的目标是以合理的准确度生成一组初始规则，这是可替补的，因此可以通过人类开发人员容易地改装。我们提出了一个有效的规则感应过程，在四阶段手动规则开发过程中建模，并与我们的系统一起出现初步有前途的结果。我们还提出了一种简单的提取器复杂性的概念作为量化提取器的可解释性的第一步，并研究感应偏差和定制基本特征对诱导规则的准确性和复杂性的影响。我们通过实验证明了诱导规则根据我们的复杂度措施具有良好的准确性和低复杂性。

著录项

来源
《Conference on empirical methods in natural language processing;Conference on computational natural language learning》|2012年|128-138|共11页
会议地点
作者
Ajay Nagesh; Ganesh Ramakrishnan; Laura Chiticariu; Rajasekar Krishnamurthy; Ankush Dharkar; Pushpak Bhattacharyya;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. CustNER: A Rule-Based Named-Entity Recognizer With Improved Recall [J] . Mumtaz Raabia, Qadir Muhammad Abdul International journal on Semantic Web and information systems . 2020,第3期

机译：CUSTLER：基于规则的命名实体识别器，具有改进的召回
2. Efficient Web Usage Miner using Decisive Induction Rules [J] . Poongothai K., S. Sathiyabama Journal of computer sciences . 2012,第6期

机译：使用果断归纳规则的高效Web用法挖掘器
3. Computationally efficient induction of classification rules with the PMCRI and J-PMCRI frameworks [J] . Frederic Stahl, Max Bramer Knowledge-Based Systems . 2012,第期

机译：使用PMCRI和J-PMCRI框架以计算方式高效地归类分类规则
4. Towards Efficient Named-Entity Rule Induction for Customizability [C] . Ajay Nagesh, Ganesh Ramakrishnan, Laura Chiticariu, Conference on empirical methods in natural language processing . 2012

机译：对可自定义性的有效命名实体规则归纳
5. Resource efficient parallel VLDB with customizable degree of redundancy. [D] . Xiong, Fanfan. 2009

机译：资源有效的并行VLDB，具有可定制的冗余度。
6. A rule-based named-entity recognition method for knowledge extraction of evidence-based dietary recommendations [O] . Tome Eftimov, Barbara Koroušić Seljak, Peter Korošec -1

机译：基于规则的命名实体识别方法用于基于证据的饮食推荐知识的提取
7. Towards efficient named-entity rule induction for customizability [O] . Ajay Nagesh, Ganesh Ramakrishnan, Laura Chiticariu, 2012

机译：致力于有效的命名实体规则归纳以实现可定制性

Towards Efficient Named-Entity Rule Induction for Customizability

摘要

著录项

相似文献

相关主题

期刊订阅