...
首页> 外文期刊>IEEE Transactions on Pattern Analysis and Machine Intelligence >Script and Language Identification in Noisy and Degraded Document Images
【24h】

Script and Language Identification in Noisy and Degraded Document Images

机译:嘈杂和降级的文档图像中的脚本和语言识别

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

This paper reports an identification technique that detects scripts and languages of noisy and degraded document images. In the proposed technique, scripts and languages are identified through the document vectorization, which converts each document image into a document vector that characterizes the shape and frequency of the conta ned character or word images. Document images are vectorized by using vertical component cuts and character extremum points, which are both tolerant to the variation in text fonts and styles, noise, and various types of document degradation. For each script or language under study, a script or language template is first constructed through a training process. Scripts and languages of document images are then determined according to the distances between converted document vectors and the pre-constructed script and language templates. Experimental results show that the proposed technique is accurate, easy for extension, and tolerant to noise and various types of document degradation.
机译:本文报告了一种识别技术,该技术可检测嘈杂和降级的文档图像的脚本和语言。在提出的技术中,通过文档矢量化来识别脚本和语言,该文档矢量化将每个文档图像转换为表征受污染的字符或文字图像的形状和频率的文档矢量。通过使用垂直分量剪切和字符极值点对文档图像进行矢量化处理,这两个方面都可以忍受文本字体和样式的变化,噪声以及各种类型的文档退化。对于每种正在研究的脚本或语言,首先通过培训过程构建脚本或语言模板。然后,根据转换后的文档向量与预先构建的脚本和语言模板之间的距离来确定文档图像的脚本和语言。实验结果表明,所提出的技术准确,易于扩展,耐噪声和各种类型的文档退化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号