首页> 外文会议> >An edge-based block segmentation and classification for document analysis with automatic character string extraction
【24h】

An edge-based block segmentation and classification for document analysis with automatic character string extraction

机译:基于边缘的块分割和分类,用于具有自动字符串提取功能的文档分析

获取原文

摘要

Presents an edge-based block segmentation and classification with automatic character string extraction for document analysis. By exploiting only four edge features from the gradient and the orientation of the edge pixels, we can make the block segmentations, classifications, and the character string extractions all insensitive to the background noise and the brightness variation of the image. We can efficiently classify a document image into seven categories of small-sized letters, large-sized letters, tables, equations, flow charts, graphs, and photographs, the first five of which are text or character blocks containing characters, and the last two are non-character blocks. We can obtain an efficient block segmentation with reduced memory size by introducing the column and the text line intervals of the document in CRLA (constrained run length algorithm). The simulation results show that an efficient document image segmentation, block classification, and the character string extraction can be done concurrently.
机译:呈现基于边缘的块分段和分类,具有用于文档分析的自动字符串提取。通过从梯度和边缘像素的方向仅利用四个边缘特征,我们可以制作块分割,分类和字符串提取,对背景噪声和图像的亮度变化进行了不敏感。我们可以有效地将文档图像分类为七个类别的小型字母,大型字母,表,方程,流程图,图形和照片,其中前五个是包含字符的文本或字符块,以及最后两个是非字符块。通过在CRLA中引入列和文档的文本线间隔,我们可以获得具有减小的内存大小的有效块分割,并在CRLA中的文本(受限的运行长度算法)。仿真结果表明,可以同时进行高效的文档图像分割,块分类和字符串提取。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号