首页> 外文期刊>The international arab journal of information technology >A Hybrid Approach for Urdu Sentence Boundary Disambiguation
【24h】

A Hybrid Approach for Urdu Sentence Boundary Disambiguation

机译:乌尔都语句子边界消歧的混合方法

获取原文
获取原文并翻译 | 示例
           

摘要

Sentence boundary identification is a preliminary step for preparing a text document for Natural Language Processing tasks, e.g., machine translation, POS tagging, text summarization and etc. We present a hybrid approach for Urdu sentence boundary disambiguation comprising of unigram statistical model and rule based algorithm. After implementing this approach, we obtained 99.48% precision, 86.35% recall and 92.45% F1-Measure while keeping training and testing data different from each other, and with same training and testing data, we obtained 99.36% precision, 96.45% recall and 97.89% Fl-Measure.
机译:句子边界识别是准备用于自然语言处理任务(例如机器翻译,POS标签,文本摘要等)的文本文档的预备步骤。我们提出了一种混合的方法,用于乌尔都语句子边界消歧,包括单字统计模型和基于规则的算法。实施此方法后,我们在保持训练和测试数据互不相同的情况下获得了99.48%的精度,86.35%的召回率和92.45%的F1-Measure,并且在相同的训练和测试数据下,我们获得了99.36%的精度,96.45%的召回率和97.89 %Fl-测量值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号