首页> 外文期刊>Expert Systems with Application >A multiobjective simulated annealing approach for classifier ensemble: Named entity recognition in Indian languages as case studies
【24h】

A multiobjective simulated annealing approach for classifier ensemble: Named entity recognition in Indian languages as case studies

机译:分类器集成的多目标模拟退火方法:以印度语言中的命名实体识别为案例研究

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we propose a simulated annealing (SA) based multiobjective optimization (MOO) approach for classifier ensemble. Several different versions of the objective functions are exploited. We hypothesize that the reliability of prediction of each classifier differs among the various output classes. Thus, in an ensemble system, it is necessary to find out the appropriate weight of vote for each output class in each classifier. Diverse classification methods such as Maximum Entropy (ME), Conditional Random Field (CRF) and Support Vector Machine (SVM) are used to build different models depending upon the various representations of the available features. One most important characteristics of our system is that the features are selected and developed mostly without using any deep domain knowledge and/or language dependent resources. The proposed technique is evaluated for Named Entity Recognition (NER) in three resource-poor Indian languages, namely Bengali, Hindi and Telugu. Evaluation results yield the recall, precision and F-measure values of 93.95%, 95.15% and 94.55%, respectively for Bengali, 93.35%, 92.25% and 92.80%, respectively for Hindi and 84.02%, 96.56% and 89.85%, respectively for Telugu. Experiments also suggest that the classifier ensemble identified by the proposed MOO based approach optimizing the F-measure values of named entity (NE) boundary detection outperforms all the individual models, two conventional baseline models and three other MOO based ensembles.
机译:在本文中,我们为分类器集成提出了一种基于模拟退火(SA)的多目标优化(MOO)方法。目标函数的几种不同版本。我们假设每个分类器的预测可靠性在各种输出类别之间是不同的。因此,在集成系统中,有必要为每个分类器中的每个输出类找出合适的投票权重。根据可用功能的各种表示形式,使用了诸如最大熵(ME),条件随机场(CRF)和支持向量机(SVM)之类的不同分类方法来构建不同的模型。我们系统的最重要特征之一是,大多数特征是在不使用任何深层知识和/或语言相关资源的情况下进行选择和开发的。以三种资源匮乏的印度语言(孟加拉语,北印度语和泰卢固语)对提出的技术进行了命名实体识别(NER)的评估。评估结果显示,孟加拉语的召回率,精确度和F测量值分别为93.95%,95.15%和94.55%,印地语分别为93.35%,92.25%和92.80%,印地语分别为84.02%,96.56%和89.85%。泰卢固语。实验还表明,通过基于拟议的基于MOO的方法对命名实体(NE)边界检测的F度量值进行优化的分类器整体性能优于所有单个模型,两个常规基准模型和三个基于MOO的整体。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号