首页> 外文期刊>ACM transactions on Asian language information processing >Boosting Neural POS Tagger for Farsi Using Morphological Information
【24h】

Boosting Neural POS Tagger for Farsi Using Morphological Information

机译:使用形态学信息为波斯语增强神经POS标记

获取原文
获取原文并翻译 | 示例
       

摘要

Farsi (Persian) is a low-resource language that suffers from the data sparsity problem and a lack of efficient processing tools. Due to their broad application in natural language processing tasks, part-of-speech (POS) taggers are one of those important tools that should be considered in this respect. Despite recent work on Farsi tagging, there is still room for improvement. The best reported accuracy so far is 96%, which in special cases can rise to 96.9%. The main problem with existing taggers is their inefficiency in coping with out-of-vocabulary (OOV) words. Addressing both problems of accuracy and OOV words, we developed a neural network-based POS tagger (NPT) that performs efficiently on Farsi. Despite using less data, NPT provides better results in comparison to state-of-the-art systems. Our proposed tagger performs with an accuracy of 97.4%, with performance highly influenced by morphological features. We carry out a shallow morphological analysis and show considerable improvement over the baseline configuration.
机译:波斯语(波斯语)是一种资源匮乏的语言,受数据稀疏性问题困扰,缺少高效的处理工具。由于它们在自然语言处理任务中的广泛应用,词性(POS)标记器是在这方面应考虑的那些重要工具之一。尽管最近在波斯语标记方面开展了工作,但仍有改进的空间。迄今为止,报告的最佳准确性为96%,在特殊情况下可以提高到96.9%。现有标记器的主要问题是它们在应对词汇外(OOV)单词方面效率低下。为了解决准确性和OOV词这两个问题,我们开发了一种基于神经网络的POS标记器(NPT),该标记器可以在Farsi上高效执行。尽管使用的数据较少,但是与最新系统相比,NPT可以提供更好的结果。我们提出的标记器的准确度为97.4%,其性能受形态特征的影响很大。我们进行了浅层的形态分析,并显示出相对于基线配置的显着改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号