首页> 外文学位 >Application of statistical pattern recognition to document segmentation and labelling.
【24h】

Application of statistical pattern recognition to document segmentation and labelling.

机译:统计模式识别在文档分割和标记中的应用。

获取原文
获取原文并翻译 | 示例

摘要

In the field of computer analysis of document images, the problems of physical and logical layout analysis have been approached through a variety of heuristic, rule-based, and grammar-based techniques. In this paper we investigate the effectiveness of statistical pattern recognition algorithms for solving these two problems. Using a new software environment for manual page image segmentation and labelling, a dataset containing 932 page images from academic journals has been created. Several physical layout analysis algorithms have been implemented, including a new algorithm based on a logistic regression classifier. Three statistical classifiers were applied to the logical layout analysis problem, with encouraging results. A new model for how ink is laid out on a page was used to develop a prototype combined segmentation and labeling system. Finally, several applications have been investigated, and rudimentary implementations demonstrated. Results indicate that statistical pattern recognition approaches to these problems will be very fruitful.
机译:在文档图像的计算机分析领域中,已经通过各种启发式,基于规则和基于语法的技术来解决物理和逻辑布局分析的问题。在本文中,我们研究了统计模式识别算法解决这两个问题的有效性。使用新的软件环境进行手动页面图像分割和标记,已创建了一个包含932种来自学术期刊的页面图像的数据集。已经实现了几种物理布局分析算法,包括基于逻辑回归分类器的新算法。将三个统计分类器应用于逻辑布局分析问题,结果令人鼓舞。一种用于在页面上如何布置墨水的新模型用于开发原型组合的分段和标记系统。最后,对几个应用程序进行了研究,并展示了基本的实现。结果表明,针对这些问题的统计模式识别方法将非常有成果。

著录项

  • 作者

    Laven, Kevin.;

  • 作者单位

    University of Toronto (Canada).;

  • 授予单位 University of Toronto (Canada).;
  • 学科 Computer Science.
  • 学位 M.Sc.
  • 年度 2005
  • 页码 145 p.
  • 总页数 145
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号