首页> 外文会议>2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition >LABA: Logical Layout Analysis of Book Page Images in Arabic Using Multiple Support Vector Machines
【24h】

LABA: Logical Layout Analysis of Book Page Images in Arabic Using Multiple Support Vector Machines

机译:LABA:使用多种支持向量机的阿拉伯语书页图像的逻辑布局分析

获取原文
获取原文并翻译 | 示例

摘要

Logical layout analysis, which determines the function of a document region, for example, whether it is a title, paragraph, or caption, is an indispensable part in a document understanding system. Rule-based algorithms have long been used for such systems. The datasets available have been small, and so the generalization of the performance of these systems is difficult to assess. In this paper, we present LABA, a supervised machine learning system based on multiple support vector machines for conducting a logical Layout Analysis of scanned pages of Books in Arabic. Our system labels the function (class) of a document(scanned book pages) region, based on its position on the page and other features. We evaluated LABA with the benchmark "BCE-Arabic-v1" dataset, which contains scanned pages of illustrated Arabic books. We obtained high recall and precision values, and found that the F-measure of LABA is higher for all classes except the "noise" class compared to a neural network method that was based on prior work.
机译:确定文档区域功能(例如,是标题,段落还是标题)的逻辑布局分析是文档理解系统中必不可少的部分。长期以来,基于规则的算法已用于此类系统。可用的数据集很小,因此很难评估这些系统的性能。在本文中,我们介绍了LABA,这是一个基于多种支持向量机的有监督机器学习系统,用于对阿拉伯语的图书扫描页面进行逻辑布局分析。我们的系统根据文档(扫描的书页)区域在页面上的位置和其他功能来标记其功能(类)。我们使用基准“ BCE-Arabic-v1”数据集评估了LABA,该数据集包含插图阿拉伯图书的扫描页。我们获得了较高的查全率和精度值,并且发现与基于先前工作的神经网络方法相比,除“噪声”类外,所有类别的LABA的F值均较高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号