首页> 外文会议>Pattern Recognition and Machine Intelligence >A Hidden Markov Model Based Named Entity Recognition System: Bengali and Hindi as Case Studies
【24h】

A Hidden Markov Model Based Named Entity Recognition System: Bengali and Hindi as Case Studies

机译:基于隐马尔可夫模型的命名实体识别系统:以孟加拉语和北印度语为例

获取原文
获取原文并翻译 | 示例

摘要

Named Entity Recognition (NER) has an important role in almost all Natural Language Processing (NLP) application areas including information retrieval, machine translation, question-answering system, automatic summarization etc. This paper reports about the development of a statistical Hidden Markov Model (HMM) based NER system. The system is initially developed for Bengali using a tagged Bengali news corpus, developed from the archive of a leading Bengali newspaper available in the web. The system is trained with a training corpus of 150,000 wordforms, initially tagged with a HMM based part of speech (POS) tagger. Evaluation results of the 10-fold cross validation test yield an average Recall, Precision and F-Score values of 90.2%, 79.48% and 84.5%, respectively. This HMM based NER system is then trained and tested on the Hindi data to show its effectiveness towards the language independent abilities. Experimental results of the 10-fold cross validation test has demonstrated the average Recall, Precision and F-Score values of 82.5%, 74.6% and 78.35%, respectively with 27,151 Hindi wordforms.
机译:命名实体识别(NER)在几乎所有自然语言处理(NLP)应用领域中都扮演着重要角色,包括信息检索,机器翻译,问题解答系统,自动摘要等。本文报告了统计隐马尔可夫模型的发展(基于HMM的NER系统。该系统最初是使用标记的孟加拉语新闻语料库为孟加拉语开发的,该语料库是从网络上领先的孟加拉语报纸的档案库中开发的。该系统使用150,000个字形的训练语料库进行了训练,最初使用基于HMM的词性(POS)标记器进行标记。 10倍交叉验证测试的评估结果分别得出召回率,精确度和F分数的平均值分别为90.2%,79.48%和84.5%。然后,对基于HMM的NER系统进行训练,并在北印度语数据上进行测试,以显示其对独立于语言的能力的有效性。 10倍交叉验证测试的实验结果表明,使用27,151种印地语字形时,平均Recall,Precision和F-Score值分别为82.5%,74.6%和78.35%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号