...
首页> 外文期刊>The international arab journal of information technology >Prediction of Part of Speech Tags for Punjabi using Support Vector Machines
【24h】

Prediction of Part of Speech Tags for Punjabi using Support Vector Machines

机译:支持向量机对旁遮普语部分语音标签的预测

获取原文
获取原文并翻译 | 示例

摘要

Part-Of-Speech (POS) tagging is a task of assigning the appropriate POS or lexical category to each word in a natural language sentence. In this paper, we have worked on automated annotation of POS tags for Punjabi. We have collected a corpus of around 27,000 words, which included the text from various stories, essays, day-to-day conversations, poems etc., and divided these words into different size files for training and testing purposes. In our approach, we have used Support Vector Machine (SVM) for tagging Punjabi sentences. To the best of our knowledge, SVMs have never been used for tagging Punjabi text. The result shows that SVM based tagger has outperformed the existing taggers. In the existing POS taggers of Punjabi, the accuracy of POS tagging for unknown words is less than that for known words. But in our proposed tagger, high accuracy has been achieved for unknown and ambiguous words. The average accuracy of our tagger is 89.86%, which is better than the existing approaches.
机译:词性(POS)标记是为自然语言句子中的每个单词分配适当的POS或词法类别的任务。在本文中,我们致力于为旁遮普语自动标记POS标签。我们收集了大约27,000个单词的语料库,其中包括来自各种故事,文章,日常对话,诗歌等的文本,并将这些单词分为不同大小的文件以进行培训和测试。在我们的方法中,我们使用了支持向量机(SVM)来标记旁遮普语句子。据我们所知,SVM从未用于标记旁遮普文本。结果表明,基于SVM的标记器的性能优于现有标记器。在现有的旁遮普语POS标记器中,未知单词的POS标记准确性低于已知单词的POS标记准确性。但是在我们提出的标记器中,对于未知和歧义的单词已经实现了高精度。我们的标记器的平均准确度为89.86%,比现有方法要好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号