首页> 外文会议>Document Recognition III >Document zone classification using sizes of connected components
【24h】

Document zone classification using sizes of connected components

机译:使用连接组件的大小对文档区域进行分类

获取原文

摘要

Abstract: In this paper, we describe a feature based supervised zone classifier using only the knowledge of the widths and the heights of the connected-components within a given zone. The distribution of the widths and the heights of the connected-components is encoded into a n multiplied by m dimensional vector in the decision making. Thus, the computational complexity is in the order of the number of connected-components within the given zone. A binary decision tree is used to assign a zone class on the basis of its feature vector. The training and testing data sets for the algorithm are drawn from the scientific document pages in the UW-I database. The classifier is able to classify each given scientific and technical document zone into one of the eight labels: text of font size 8-12, text of font size 13-18, text of font size 19-36, display math, table, halftone, line drawing, and ruling, in real time. The classifier is able to discriminate text from non-text with an accuracy greater than 97%. !8
机译:摘要:在本文中,我们仅使用给定区域内连接组件的宽度和高度的知识来描述基于特征的监督区域分类器。在决策过程中,将连接组件的宽度和高度的分布编码为n乘以m维向量。因此,计算复杂度为给定区域内连接组件数的数量级。二进制决策树用于根据其特征向量分配区域类。该算法的训练和测试数据集来自UW-I数据库中的科学文档页面。分类器能够将每个给定的科学技术文档区域划分为八个标签之一:字体大小为8-12的文本,字体大小为13-18的文本,字体大小为19-36的文本,显示数学,表格,半色调,线条绘制和裁定,实时进行。分类器能够以高于97%的准确度将文本与非文本区分开。 !8

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号