首页> 外文会议>International Workshop on Document Analysis Systems >Language Identification in Degraded and Distorted Document Images
【24h】

Language Identification in Degraded and Distorted Document Images

机译:劣化和扭曲文档图像中的语言识别

获取原文

摘要

This paper presents a language identification technique that differentiates Latin-based languages in degraded and distorted document images. Different from the reported methods that transform word images through a character shape coding process, our method directly captures word shapes with the local extremum points and the horizontal intersection numbers, which are both tolerant of noise, character segmentation errors, and slight skew distortions. For each language studied, a word shape template and a word frequency template are firstly constructed based on the proposed word shape coding scheme. Identification is then accomplished based on Bray Curtis or Hamming distance between the word shape code of query images and the constructed word shape and frequency templates. Experiments show the average identification rate upon eight Latin-based languages reaches over 99%. ...
机译:本文介绍了一种语言识别技术,可以在劣化和失真的文档图像中区分拉丁语语言。不同于报告的方法通过字符形状编码过程转换字图像,我们的方法直接捕获与本地极值​​点和水平交叉口的字形状,这既容忍噪声,字符分割错误和轻微偏斜畸变。对于所研究的每种语言,首先基于所提出的字形编码方案来构造字形模板和单词频率模板。然后基于查询图像的单词形状代码与构造的字形和频率模板之间的Bray Curtis或汉明距离来完成识别。实验显示八个拉丁语语言的平均识别率超过99%。 ......

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号