首页> 外文期刊>Procedia - Social and Behavioral Sciences >A Structure for Annotation and Ground-truthing of Urdu Handwritten Text Image Corpus
【24h】

A Structure for Annotation and Ground-truthing of Urdu Handwritten Text Image Corpus

机译:乌尔都语手写文本映像语料库的注释和地面结构的结构

获取原文
       

摘要

Over the last few decades, a large evolution has been made in the field of handwritten recognition. Material of handwritten documents is become less with current trends of digital electronics. However, for the investigation and research on a particular language a large volume of handwritten documents database is required. In this paper we describe our approach for development a large volume of Urdu handwritten text images Corpus on Urdu language. To make the database available in large field of Natural Language Processing we annotate database for each image and associate a XML based ground-truth Meta information to make it computer compatible as a linguistic resource. This paper focus on the some issue related with Corpus design and annotation such as data collection, writers selection, methodology of annotation etc.
机译:在过去的几十年中,手写识别领域已经进行了大量进化。手写文件的材料随着数字电子产品的当前趋势而变得较小。但是,对于对特定语言的调查和研究,需要大量的手写文档数据库。在本文中,我们描述了我们在乌尔都语语言中发展大量乌尔都语手写的文本图像语料库的方法。为了使数据库提供在大型自然语言处理中,我们为每个图像注释数据库,并关联基于XML的地面真实的元信息,使IT计算机兼容作为语言资源。本文侧重于与语料库设计和注释相关的一些问题,如数据收集,作家选择,注释方法等。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号