【24h】

Creation of data resources and design of an evaluation test bed for Devanagari script recognition

机译:创建数据资源并设计用于梵文脚本识别的评估测试台

获取原文

摘要

The Indian subcontinent has a large number of languages, dialects, and scripts with the Devanagari script being the primary and most widely used of all the scripts. To date, much of the Devanagari optical character recognition (OCR) research has been restricted to a handful of groups. So, techniques have not yet been widely disseminated or evaluated independently and automated evaluation tools are currently not available for lack of a standard representation of ground-truth and result data. A key reason for the absence of sustained research efforts in off-line Devanagari OCR appears to be the paucity of data resources. Ground truthed data for words and characters, on-line dictionaries, corpora of text documents and reliable, standardized statistical analyses and evaluation tools are currently lacking. So, the creation of such data resources will undoubtedly provide a much needed fillip to researchers working on Devanagari OCR. This paper describes a National Science Foundation sponsored project under the International Digital Libraries program to create data resources that will facilitate development of Devanagari OCR technology and provide a standardized test bed and evaluation tools for Devanagari script recognition.
机译:印度次大陆具有多种语言,方言和文字,其中梵文文字是所有文字中最主要和最广泛使用的文字。迄今为止,许多梵文光学字符识别(OCR)研究仅限于少数几个小组。因此,技术尚未广泛散布或独立评估,并且由于缺乏地面真实性和结果数据的标准表示,目前还没有自动评估工具。离线Devanagari OCR中缺乏持续研究工作的主要原因似乎是缺乏数据资源。当前缺少单词和字符的真实数据,在线词典,文本文档的语料库以及可靠的,标准化的统计分析和评估工具。因此,此类数据资源的创建无疑将为从事Devanagari OCR的研究人员提供急需的补充。本文介绍了国际数字图书馆计划下国家科学基金会资助的项目,该项目创建的数据资源将促进Devanagari OCR技术的发展,并为Devanagari脚本识别提供标准化的测试平台和评估工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号