首页> 外文会议>International Conference on Engineering and Emerging Technologies >Maximum Entropy based Urdu Named Entity Recognition
【24h】

Maximum Entropy based Urdu Named Entity Recognition

机译:基于最大熵的乌尔都语命名实体识别

获取原文

摘要

Urdu is widely spoken and a national language of Pakistan. The language covers huge variety from others languages as well therefore known as “Lashkari Zuban” (a mixture of different languages. We have performed experiments on Urdu Named Entity Recognition (NER) using model-based approach. NER is the task for the identification and classification of named entities from the given text therefore; name, place, organization, time/date, etc. The task has an important role for automated systems, information extraction, machine learning and artificial intelligence. A lot of work has been done for European languages but the task for South Asian languages is its development stage. We chose Urdu language as it is our national language but still there are a lot of challenges in Urdu language as the language as very limited resources and it is also free structured language. Our research has been conducted on IJCNLP-08 dataset which is IOB (inside Outside Beginning) tagged using maximum entropy model. Precision, Recall and F-measure are used to evaluate the accuracy of the model.
机译:乌尔都语被广泛使用,是巴基斯坦的一种民族语言。该语言涵盖了其他语言的极大多样性,因此被称为“ Lashkari Zuban”(不同语言的混合物。我们已使用基于模型的方法对乌尔都语命名实体识别(NER)进行了实验。因此,该任务在自动化系统,信息提取,机器学习和人工智能中起着重要作用。语言,但南亚语言的任务是其发展阶段,我们选择了乌尔都语,因为它是我们的本国语言,但乌尔都语仍然面临很多挑战,因为该语言的资源非常有限,而且还是自由结构化语言。已使用最大熵模型对IJCNLP-08数据集进行了研究,该数据集是带有IOB(内部外部开始)标记的,使用精确度,查全率和F量度来评估模型的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号