首页> 外文会议> >Preattentive reading and selective attention for document image analysis
【24h】

Preattentive reading and selective attention for document image analysis

机译:专心阅读并选择性注意文档图像分析

获取原文

摘要

PixED (from Pixel to Electronic Document) is aimed at converting document images into structured electronic documents which can be read by a machine for information retrieval. The approach is based on the combination of perception and symbol reading which are the two processes involved when humans detect the organisation of a document. "Preattentive reading" denotes the physical segmentation related to perceptual organisation. "Selective attention" means that symbol reading is limited to specific sequences of symbols or to pre-attentively selected locations. An OCR provides the primary structured description of the document. PixED improves the quality of this description, completes the physical segmentation and adds a logical description. A distributed software architecture and an incremental strategy are defined to enable the integration of perception and symbol reading. The approach is tested on a set of documents composed of several pages which are gathered from proceedings of scientific conferences.
机译:PixED(从像素到电子文档)旨在将文档图像转换为结构化的电子文档,可以由机器读取以进行信息检索。该方法基于感知和符号读取的组合,这是人类检测到文档组织时涉及的两个过程。 “专心阅读”表示与知觉组织有关的物理分割。 “选择性注意”是指符号读取仅限于特定的符号序列或预先精心选择的位置。 OCR提供了文档的主要结构化描述。 PixED提高了此描述的质量,完成了物理分段并添加了逻辑描述。定义了分布式软件体系结构和增量策略以实现感知和符号读取的集成。该方法在一组由几页的文件组成的文档中进行了测试,这些文件是从科学会议的记录中收集的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号