首页> 外文会议>International Conference on Signal Image Technology Internet Based Systems >A Framework for Compilation of Multi-lingual Handwritten Database: Four Levels XML Ground-Truth
【24h】

A Framework for Compilation of Multi-lingual Handwritten Database: Four Levels XML Ground-Truth

机译:汇编多语言手写数据库的框架:四级XML地面真相

获取原文

摘要

In this paper, we are presenting a semi-automatic framework for annotating multi-lingual handwritten texts document images. There is a significant need for a structure that can annotate the coordinate segmentation information of the text present in a handwritten document image to provide a platform for OCR algorithm evaluation. In this paper, we describe an XML based four level annotations of handwritten text image that contain the ground-truth information of script text image in Unicode format. In order to collect the huge amount of data for linguistic researchers, structure provide a way to store and annotate at different four levels: Image, Lines, Words and Characters which aids for benchmarking of various OCRs. Structure would be best source for compilation of an annotated handwritten corpora in systematic and scientific way by storing a labelling(markup) information of image script texts in a Unicode and an XML file format that encapsulates the bounding box pixel information of each level in a collaborative manner. The structure provides useful results based on the annotation for various quantitative and statistical corpus approaches to linguistic analysis.
机译:在本文中,我们展示了一个半自动框架,用于注释多语言手写文本文档图像。可以对可以向手写文档图像中存在的文本的坐标分割信息注释的结构有重大的需要,以提供用于OCR算法评估的平台。在本文中,我们描述了一种基于XML的手写文本图像的四个级别注释,其包含Unicode格式的脚本文本图像的地面真实信息。为了收集语言研究人员的大量数据,结构提供了一种在不同的四个级别存储和注释的方法:辅助各种OCR的基准测试的图像,线条,单词和字符。结构将是通过在Unicode中的图像脚本文本和XML文件格式中存储图像脚本文本的标签(标记)信息来编制带注释的手写语料库的最佳来源,并在协作中封装每个级别的每个级别的边界框像素信息方式。该结构基于对语言分析的各种定量和统计语料库方法的注释提供了有用的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号