首页> 外文会议>International Conference on Computer and Information Technology >Bangla Parts-of-Speech tagging using Bangla stemmer and rule based analyzer
【24h】

Bangla Parts-of-Speech tagging using Bangla stemmer and rule based analyzer

机译:使用Bangla词干分析器和基于规则的分析器对Bangla词性标注

获取原文

摘要

Parts-of-Speech (POS) tagging plays vital roles in the field of Natural Language Processing (NLP), such as - machine translation, spell checker, information retrieval, speech processing, emotion analysis and so on. Bangla is a very inflectional language that induces many variants from a single word. Although there is a few POS Tagger in Bangla language, very small of them address the essence of suffices to identify tag of the words. In this regard, we propose an automated POS Tagging system for Bangla language based on word-suffixes. In our system, we use our own stemming technique to retrieve a possible minimum root words and apply rules according to different forms of suffixes. Moreover, we incorporate a Bangla vocabulary that contains more than 45,000 words with their default tag and a patterned based verb-data-set. These facilitate to improve tagging efficiency of Bangla POS Tagger. We experiment our proposed system on a Bangla text corpus. The result shows that our proposed Bangla POS Tagger has outperformed the known related tagging systems.
机译:词性(POS)标记在自然语言处理(NLP)领域中起着至关重要的作用,例如-机器翻译,拼写检查,信息检索,语音处理,情绪分析等。孟加拉语是一种非常易用的语言,可以从一个单词中引出许多变体。尽管孟加拉语中有一些POS Tagger,但其中只有很少一部分满足了识别单词标记的充分本质。在这方面,我们提出了一个基于单词后缀的孟加拉语自动POS标记系统。在我们的系统中,我们使用自己的词干提取技术来检索可能的最小根词并根据后缀的不同形式应用规则。此外,我们结合了孟加拉语词汇,该词汇包含超过45,000个单词及其默认标记和基于模式的动词数据集。这些有助于提高Bangla POS Tagger的标记效率。我们在孟加拉语语料库上对我们提出的系统进行实验。结果表明,我们提出的Bangla POS Tagger性能优于已知的相关标记系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号