首页> 外文期刊>Data & Knowledge Engineering >Combining data-driven systems for improving Named Entity Recognition
【24h】

Combining data-driven systems for improving Named Entity Recognition

机译:组合数据驱动系统以改善命名实体识别

获取原文
获取原文并翻译 | 示例

摘要

The increasing flow of digital information requires the extraction, filtering and classification of pertinent information from large volumes of texts. All these tasks greatly benefit from involving a Named Entity Recognizer (NER) in the preprocessing stage. This paper proposes a completely automatic NER system. The NER task involves not only the identification of proper names (Named Entities) in natural language text, but also their classification into a set of predefined categories, such as names of persons, organizations (companies, government organizations, committees, etc.), locations (cities, countries, rivers, etc.) and miscellaneous (movie titles, sport events, etc.). Throughout the paper, we examine the differences between language models learned by different data-driven classifiers confronted with the same NLP task, as well as ways to exploit these differences to yield a higher accuracy than the best individual classifier. Three machine learning classifiers (Hidden Markov Model, Maximum Entropy and Memory Based Learning) are trained on the same corpus in order to resolve the NE task. After comparison, their output is combined using voting strategies. A comprehensive study and experimental work on the evaluation of our system, as well as a comparison with other systems has been carried out within the framework of two specialized scientific competitions for NER, CoNLL-2002 and HAREM-2005. Finally, this paper describes the integration of our NER system in different NLP applications, in concrete Geographic Information Retrieval and Conceptual Modelling.
机译:数字信息流的增长要求从大量文本中提取,过滤和分类相关信息。所有这些任务都受益于在预处理阶段涉及命名实体识别器(NER)。本文提出了一种全自动的NER系统。 NER任务不仅涉及识别自然语言文本中的专有名称(命名实体),还涉及将其分类为一组预定义类别,例如人员,组织(公司,政府组织,委员会等)的名称,地点(城市,国家/地区,河流等)和其他地点(电影标题,体育赛事等)。在整篇论文中,我们研究了面对相同NLP任务的不同数据驱动分类器学习的语言模型之间的差异,以及利用这些差异来产生比最佳单个分类器更高的准确性的方法。在同一语料库上训练了三个机器学习分类器(隐马尔可夫模型,最大熵和基于记忆的学习),以解决NE任务。比较之后,使用投票策略将其输出合并。在两次针对NER的专业科学竞赛(CoNLL-2002和HAREM-2005)的框架内,对我们的系统进行了全面的研究和实验工作,并与其他系统进行了比较。最后,本文描述了我们的NER系统在不同的NLP应用程序中的集成,包括具体的地理信息检索和概念建模。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号