首页> 外文会议>Visualization, Imaging, and Image Processing >Line Extraction for Multi-level Language Document Image
【24h】

Line Extraction for Multi-level Language Document Image

机译:多级语言文档图像的行提取

获取原文

摘要

This paper describes the modified minimum spanning tree (MST) technique for extracting the text line in the multilevel sentence structure document. This is a rough classification based on the character level. The key idea of the technique is to find the characters in main level and trying to reduce the effect of small characters in other levels. The technique is divided irito 6 steps. First, the boundaries of the object are detected and are filtered the small objects out. Second, the tree are created and estimated the angle of the document. Third, calculate cost value of all branches and reduce the tree with MST technique. Fourth, remove unexpected branch. Fifth, find the level boundaries of each sentence and classify the level of each object. Finally, recover the small font text line. Our experiments include 150 document images, that are from various types of document images and some special cases for testing the robustness. Furthermore, the proposed technique can also be applied to handwritten documents.
机译:本文介绍了改进的最小生成树(MST)技术,用于提取多级句子结构文档中的文本行。这是基于字符级别的粗略分类。该技术的关键思想是在主级别中找到字符,并尝试减少其他级别中小字符的影响。该技术分为irito 6个步骤。首先,检测物体的边界并过滤掉小物体。其次,创建树并估计文档的角度。第三,使用MST技术计算所有分支的成本值并减少树。第四,删除意外分支。第五,找到每个句子的级别边界,并对每个对象的级别进行分类。最后,恢复小字体文本行。我们的实验包括150张文档图像,这些图像来自各种类型的文档图像以及一些用于测试鲁棒性的特殊情况。此外,所提出的技术也可以应用于手写文档。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号