首页> 外文会议> >Document understanding using probabilistic relaxation: application on tables of contents of periodicals
【24h】

Document understanding using probabilistic relaxation: application on tables of contents of periodicals

机译:使用概率松弛进行文档理解:在期刊目录中的应用

获取原文

摘要

This paper describes a statistical model for a document understanding system, which uses both text attributes and document layouts. Probabilistic relaxation is used as a recognition scheme to find the hierarchical structure of the logical layout. This approach, commonly used for pixels classification in image analysis, can be applied to classify text blocks into logical classes according to local compatibility with other neighboring blocks at different hierarchical levels. It provides a logical layout that is globally compatible with the training model. We have tested this approach on reading tables of contents of periodicals for documents indexing. Probabilistic relaxation has interesting properties like high-speed training and the 'a priori' recognition rate, which provides the consistency of the model according to the features used, and the samples chosen among the training set.
机译:本文介绍了一种用于文档理解系统的统计模型,该模型同时使用文本属性和文档布局。概率松弛被用作识别方案,以找到逻辑布局的分层结构。这种方法通常用于图像分析中的像素分类,可根据与不同层次级别上其他相邻块的局部兼容性,将文本块分类为逻辑类。它提供了与培训模型全局兼容的逻辑布局。我们已经在阅读期刊目录表中的文档索引时测试了这种方法。概率松弛具有有趣的特性,例如高速训练和“先验”识别率,根据使用的特征以及训练集中选择的样本,可以提供模型的一致性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号