首页> 外文会议>International conference on computational linguistics >Differential Evolution based Feature Selection and Classifier Ensemble for Named Entity Recognition
【24h】

Differential Evolution based Feature Selection and Classifier Ensemble for Named Entity Recognition

机译:基于差分进化的特征选择和分类器集合用于命名实体识别

获取原文

摘要

In this paper, we propose a differential evolution (DE) based two-stage evolutionary approach for named entity recognition (NER). The first stage concerns with the problem of relevant feature selection for NER within the frameworks of two popular machine learning algorithms, namely Conditional Random Field (CRF) and Support Vector Machine (SVM). The solutions of the final best population provides different diverse set of classifiers; some are effective with respect to recall whereas some are effective with respect to precision. In the second stage we propose a novel technique for classifier ensemble for combining these classifiers. The approach is very general and can be applied for any classification problem. Currently we evaluate the proposed algorithm for NER in three popular Indian languages, namely Bengali, Hindi and Telugu. In order to maintain the domain-independence property the features are selected and developed mostly without using any deep domain knowledge and/or language dependent resources. Experimental results show that the proposed two stage technique attains the final F-measure values of 88.89%, 88.09% and 76.63% for Bengali, Hindi and Telugu, respectively. The key contributions of this work are two-fold, viz. (ⅰ). proposal of differential evolution (DE) based feature selection and classifier ensemble methods that can be applied to any classification problem; and (ⅱ). scope of the development of language independent NER systems in a resource-poor scenario.
机译:在本文中,我们提出了一种基于差分进化(DE)的两阶段进化方法,用于命名实体识别(NER)。第一阶段涉及在两种流行的机器学习算法(即条件随机场(CRF)和支持向量机(SVM))的框架内为NER进行相关特征选择的问题。最终最佳总体的解决方案提供了不同的分类器集;有些在召回方面有效,而有些在准确性方面有效。在第二阶段,我们提出了一种新的分类器集成技术,用于将这些分类器组合在一起。该方法非常通用,可以应用于任何分类问题。目前,我们以三种流行的印度语(孟加拉语,北印度语和泰卢固语)评估NER的拟议算法。为了维持域独立性,大部分特征是在不使用任何深层知识和/或语言相关资源的情况下进行选择和开发的。实验结果表明,所提出的两阶段技术对孟加拉语,北印度语和泰卢固语的最终F值分别达到88.89%,88.09%和76.63%。这项工作的关键贡献是双重的,即。 (ⅰ)。提出了可用于任何分类问题的基于差分进化(DE)的特征选择和分类器集成方法的建议;和(ⅱ)。资源贫乏的情况下独立于语言的NER系统的开发范围。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号