【24h】

Script-based classification of hand-written text documents in a multilingual environment

机译:在多语言环境中基于脚本的手写文本文档分类

获取原文

摘要

Script-based text document classification is an important field of research in the context of multilingual textual document processing. But, all script identification techniques available in the literature so far do not consider handwritten documents. Variations in the writing style, character size, inter-line and inter-word spacings, etc. make the recognition process difficult and unreliable when these script identification algorithms, more specifically visual appearance based approaches, are applied directly on hand-written documents. Therefore, in this paper, we propose to preprocess the input document images so as to compensate for the variations due to writing style and thereby making them suitable for analysis on the basis of their visual appearances. Accordingly, we apply denoising, thinning, pruning, m-connectivity and text size normalization in sequence. Multi-channel Gabor filtering is used to extract texture features that characterize the visual appearances of the document images. Experimental result proves the potentiality of our proposed method of script identification for hand-written text document classification.
机译:在多语言文本文档处理的背景下,基于脚本的文本文档分类是一个重要的研究领域。但是,到目前为止,文献中所有可用的脚本识别技术都没有考虑手写文档。当这些脚本识别算法(尤其是基于视觉外观的方法)直接应用于手写文档时,书写风格,字符大小,行间和单词间间距等的变化使识别过程变得困难且不可靠。因此,在本文中,我们建议对输入的文档图像进行预处理,以补偿由于书写风格而引起的变化,从而使它们适合于基于其视觉外观进行分析。因此,我们依次应用降噪,细化,修剪,m-连通性和文本大小归一化。多通道Gabor过滤用于提取纹理特征,这些特征表征了文档图像的视觉外观。实验结果证明了我们提出的脚本识别方法在手写文本文档分类中的潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号