首页> 外文期刊>Literary & linguistic computing >FarsiTag: A part-of-speech tagging system for Persian
【24h】

FarsiTag: A part-of-speech tagging system for Persian

机译:FarsiTag:波斯语的词性标记系统

获取原文
获取原文并翻译 | 示例
           

摘要

FarsiTag is a tagging system capable of assigning the most probable part-of-speech (POS) tags to Persian words in a text. In this system, some linguistic rules have been used to select the best POS tag for every Persian word. The present study aims to report the processes during which a robust tagging system-FarsiTag-was designed and implemented on Persian texts. A POS-tagged parallel corpus of English-Persian containing about 5,000,000 words has also been developed as a side-product of the mentioned tagger. An experiment has been conducted to evaluate the performance of the system while tagging unrestricted Persian texts. The highest rate of error traces back to medical and religious genres, while the lowest system error type is related to the scientific texts. The total error rate considering all domains is as low as 1.4%, with the overall system accuracy of 98.6% which is very promising for a language like Persian.
机译:FarsiTag是一种标记系统,能够为文本中的波斯语单词分配最可能的词性(POS)标签。在该系统中,一些语言规则已被用来为每个波斯语单词选择最佳的POS标签。本研究旨在报告在波斯语文本上设计和实现健壮的标记系统FarsiTag的过程。作为提到的标记器的副产品,还开发了一个带有POS标记的英语-波斯语并行语料库,其中包含约500万个单词。在标记不受限制的波斯文字的同时,进行了一项评估系统性能的实验。错误率最高的可追溯到医学和宗教流派,而系统错误率最低的类型则与科学文献有关。考虑到所有领域的总错误率低至1.4%,整体系统准确度为98.6%,这对于像波斯语这样的语言非常有希望。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号