首页> 外文会议>Bangalore Annual Compute Conference >A rapid application development framework for rule-based named-entity extraction
【24h】

A rapid application development framework for rule-based named-entity extraction

机译:基于规则的命名实体提取的快速应用程序开发框架

获取原文

摘要

Named Entity Recognition and Classification (NERC) consist of identifying and labeling specific pieces of information like proper names from free-form textual data. There are primarily three approaches to named-entity extraction: hand-crafted rule based, machine-learning based and hybrid. Rule-based approaches consist of defining heuristics in the form of regular expressions or linguistic pattern and making use of dictionaries and lexicons for extracting named-entities. Rule-based approaches have proven to be quite successful but one of their limitations is that it requires a domain expert to manually define and encode the rules. The process of hand-engineering rules is a time consuming and tedious process. It also requires a domain expert, cannot be easily ported to other domains and languages and becomes hard to maintain. Machine learning based approaches tries to overcome these limitations by automatically learning rules or inducing a model rather than defining the rules by a human expert. In this work, we present our research on overcoming the limitations of rule-based approach by building a rapid application development framework that can expedite the process of rule-building and making it easy to maintain and apply it to other domains. We describe a framework that can enable a business user to easily define and maintain rules and lexicons.
机译:命名实体识别和分类(NERC)包括识别和标记特定的信息,如从自由窗体文本数据的正确名称。命名实体提取主要有三种方法:基于手工制作的规则,基于机器学习和混合动力。基于规则的方法包括定义符号的启发式,以正常的表达式或语言模式以及利用词典和词汇来提取命名实体。基于规则的方法已被证明是非常成功的,但其中一个限制是它需要一个域专家手动定义和编码规则。手工工程规则的过程是一种耗时和繁琐的过程。它还需要一个域专家,不能轻易移植到其他域和语言,并变得难以维护。基于机器学习的方法尝试通过自动学习规则或诱导模型来克服这些限制,而不是通过人类专家定义规则。在这项工作中,我们展示了我们通过建立快速应用开发框架来克服基于规则的方法的局限性的研究,该框架可以加快规则建设的过程,并使它易于维护并将其应用于其他领域。我们描述了一个框架,可以使业务用户能够轻松定义和维护规则和词汇。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号