Enhanced Urdu Word Segmentation using Conditional Random Fields and Morphological Context Features

机译：使用条件随机字段和形态上下文特征增强URDU字分割

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Word segmentation is a fundamental task for most of the NLP applications. Urdu adopts Nastalique writing style which does not have a concept of space. Furthermore, the inherent non-joining attributes of certain characters in Urdu create spaces within a word while writing in digital format. Thus, Urdu not only has space omission but also space insertion issues which make the word segmentation task challenging. In this paper, we improve upon the results of Zia, Raza and Athar (2018) by using a manually annotated corpus of 19,651 sentences along with morphological context features. Using the Conditional Random Field sequence modeler, our model achieves F_1 score of 0.98 for word boundary identification and 0.92 for sub-word boundary identification tasks. The results demonstrated in this paper outperform the state-of-the-art methods.

机译：Word Segmentation是大多数NLP应用程序的基本任务。 Urdu采用Nastalique写作风格，没有空间的概念。此外，URDU中某些字符的固有非加入属性在数字格式编写时在单词中创建空格。因此，URDU不仅具有空间遗漏，还具有空间插入问题，这使得单词分割任务具有挑战性。在本文中，我们通过使用19,651个句子的手动注释的语料库以及形态背景特征来改善Zia，Raza和Athar（2018）的结果。使用条件随机场序列建模器，我们的模型实现了0.98的F_1分数，用于单词边界标识和子字边界识别任务的0.92。在本文中所示的结果优于最先进的方法。

著录项

来源
《Widening Natural Language Processing Workshop》|2020年|156-159|共4页
会议地点
作者
Aamir Farhan; Mashrukh Islam; Dipti Misra Sharma;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
入库时间 2022-08-26 13:53:10

相似文献

外文文献
中文文献
专利

1. Improving dense conditional random field for retinal vessel segmentation by discriminative feature learning and thin-vessel enhancement [J] . Zhou Lei, Yu Qi, Xu Xun, Computer Methods and Programs in Biomedicine: An International Journal Devoted to the Development, Implementation and Exchange of Computing Methodology and Software Systems in Biomedical Research and Medical Practice . 2017,第期

机译：通过鉴别特征学习改善视网膜血管分割的致密条件随机场
2. Broadcast News Story Segmentation Using Conditional Random Fields and Multimodal Features [J] . Xiaoxuan WANG, Lei XIE, Mimi LU, IEICE transactions on information and systems . 2012,第5期

机译：使用条件随机字段和多模式特征进行广播新闻报道分段
3. Broadcast News Story Segmentation Using Conditional Random Fields and Multimodal Features [J] . Xiaoxuan WANG, Lei XIE, Mimi LU, IEICE Transactions on Information and Systems . 2012,第5期

机译：使用条件随机字段和多模式特征进行广播新闻报道分段
4. Urdu Word Segmentation using Conditional Random Fields (CRFs) [C] . Haris Bin Zia, Agha Ali Raza, Awais Athar International conference on computational linguistics . 2018

机译：使用条件随机字段（CRF）的乌尔都语分词
5. A Semi-Automated Approach to Medical Image Segmentation using Conditional Random Field Inference [D] . ?Hu, Yu-chi 2020

机译：使用条件随机场推断进行半自动方法的医学图像分割方法
6. Superpixel-Based Conditional Random Fields (SuperCRF): Incorporating Global and Local Context for Enhanced Deep Learning in Melanoma Histopathology [O] . Konstantinos Zormpas-Petridis, Henrik Failmezger, Shan E Ahmed Raza, -1

机译：基于超像素的条件随机场（SuperCRF）：整合全局和局部上下文以增强黑色素瘤组织病理学的深度学习

Enhanced Urdu Word Segmentation using Conditional Random Fields and Morphological Context Features

摘要

著录项

相似文献

相关主题

期刊订阅