首页> 外文期刊>Computational Intelligence >An innovative hybrid approach for extracting named entities from unstructured text data
【24h】

An innovative hybrid approach for extracting named entities from unstructured text data

机译:一种创新的混合方法,用于从非结构化文本数据中提取命名实体

获取原文
获取原文并翻译 | 示例

摘要

Named entity recognition (NER) is the core part of information extraction that facilitates the automatic detection and classification of entities in natural language text into predefined categories, such as the names of persons, organizations, locations, and so on. The output of the NER task is crucial for many applications, including relation extraction, textual entailment, machine translation, information retrieval, etc. Literature shows that machine learning and deep learning approaches are the most widely used techniques for NER. However, for entity extraction, the abovementioned approaches demand the availability of a domain-specific annotated data set. Our goal is to develop a hybrid NER system composed of rule-based deep learning as well as clustering-based approaches, which facilitates the extraction of generic entities (such as person, location, and organization) out of natural language texts of domains that lack generic named entities labeled domain data sets. The proposed approach takes the advantages of both deep learning and clustering approaches but separately, in combination with a knowledge-based approach by using a postprocessing module. We evaluated the proposed methodology on court cases (judgments) as a use case since it contains generic named entities of different forms that are poorly or not present in open-source NER data sets. We also evaluated our hybrid models on two benchmark data sets, namely, Computational Natural Language Learning (CoNLL) 2003 and Open Knowledge Extraction (OKE) 2016. The experimental results obtained from benchmark data sets show that our hybrid models achieved substantially better performance in terms of the F-score in comparison to other competitive systems.
机译:命名实体识别(NER)是信息提取的核心部分,它有助于自动检测自然语言文本中的实体并将其分类为预定义的类别,例如人员名称,组织,位置等。 NER任务的输出对于许多应用至关重要,包括关系提取,文本蕴含,机器翻译,信息检索等。文献表明,机器学习和深度学习方法是NER使用最广泛的技术。但是,对于实体提取,上述方法要求特定于域的带注释数据集的可用性。我们的目标是开发一种混合NER系统,该系统由基于规则的深度学习和基于聚类的方法组成,这有助于从缺乏自然语言域的自然语言文本中提取通用实体(例如人,位置和组织)通用命名实体标记为域数据集。所提出的方法利用了深度学习和聚类方法的优点,但是分别使用后处理模块与基于知识的方法相结合。我们对提议的关于法院案件(判决)的方法进行了评估,认为它是一个用例,因为它包含了形式不同的通用命名实体,这些实体在开源NER数据集中很少或没有出现。我们还在两个基准数据集(即计算自然语言学习(CoNLL)2003和开放知识提取(OKE)2016)上评估了我们的混合模型。从基准数据集获得的实验结果表明,我们的混合模型在性能方面显着提高了与其他竞争系统相比的F分数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号