Line Extraction for Multi-level Language Document Image

机译：多级语言文档图像的行提取

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper describes the modified minimum spanning tree (MST) technique for extracting the text line in the multilevel sentence structure document. This is a rough classification based on the character level. The key idea of the technique is to find the characters in main level and trying to reduce the effect of small characters in other levels. The technique is divided irito 6 steps. First, the boundaries of the object are detected and are filtered the small objects out. Second, the tree are created and estimated the angle of the document. Third, calculate cost value of all branches and reduce the tree with MST technique. Fourth, remove unexpected branch. Fifth, find the level boundaries of each sentence and classify the level of each object. Finally, recover the small font text line. Our experiments include 150 document images, that are from various types of document images and some special cases for testing the robustness. Furthermore, the proposed technique can also be applied to handwritten documents.

机译：本文介绍了改进的最小生成树（MST）技术，用于提取多级句子结构文档中的文本行。这是基于字符级别的粗略分类。该技术的关键思想是在主级别中找到字符，并尝试减少其他级别中小字符的影响。该技术分为irito 6个步骤。首先，检测物体的边界并过滤掉小物体。其次，创建树并估计文档的角度。第三，使用MST技术计算所有分支的成本值并减少树。第四，删除意外分支。第五，找到每个句子的级别边界，并对每个对象的级别进行分类。最后，恢复小字体文本行。我们的实验包括150张文档图像，这些图像来自各种类型的文档图像以及一些用于测试鲁棒性的特殊情况。此外，所提出的技术也可以应用于手写文档。

著录项

来源
《Visualization, Imaging, and Image Processing》|2003年|P.564-568|共5页
会议地点
作者
Ithipan Methasate;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类图像信号处理;
关键词
line extraction; minimum spanning tree; multi-level language; document image;

机译：行提取最小生成树多层次语言文档图像;

相似文献

外文文献
中文文献
专利

1. Script Segmentation of Printed Devnagari and Bangla Languages Document Images OCR [J] . International Journal of Computer Science and Technology . 2011,第2期

机译：印刷的天语和孟加拉语言文档图像OCR的脚本分割
2. IMAGE-BASED KEYWORD RECOGNITION IN ORIENTAL LANGUAGE DOCUMENT IMAGES [J] . Zhu JS., Hull JJ., Hong T. Pattern Recognition: The Journal of the Pattern Recognition Society . 1997,第8期

机译：原始语言文档图像中基于图像的关键字识别
3. Signature based Document Image Retrieval Using Multi-level DWT Features [J] . Umesh D. Dixit, M. S. Shirdhonkar International Journal of Image, Graphics and Signal Processing . 2017,第8期

机译：使用多级DWT功能的基于签名的文档图像检索
4. Deep Reader: Information Extraction from Document Images via Relation Extraction and Natural Language [C] . D. Vishwanath, Rohit Rahul, Gunjan Sehgal, Asian Conference on Computer Vision . 2019

机译：深度读者：通过关系提取和自然语言从文档图像提取信息
5. Extraction of Text Objects in Image and Video Documents. [D] . Zhang, Jing. 2012

机译：提取图像和视频文档中的文本对象。
6. Multi-Level Features Extraction for Discontinuous Target Tracking in Remote Sensing Image Monitoring [O] . Bin Zhou, Xuemei Duan, Dongjun Ye, 2019

机译：遥感影像监测中不连续目标跟踪的多级特征提取
7. A New Framework for Automatic Airports Extraction from SAR Images Using Multi-Level Dual Attention Mechanism [O] . Lifu Chen, Siyu Tan, Zhouhao Pan, 2020

机译：使用多级双重关注机制从SAR图像中提取的自动机场新框架

Line Extraction for Multi-level Language Document Image

摘要

著录项

相似文献

相关主题

期刊订阅