首页> 外文会议>AAAI Conference on Artificial Intelligence >TableSense: Spreadsheet Table Detection with Convolutional Neural Networks
【24h】

TableSense: Spreadsheet Table Detection with Convolutional Neural Networks

机译:tablesense:带卷积神经网络的电子表格表检测

获取原文

摘要

Spreadsheet table detection is the task of detecting all tables on a given sheet and locating their respective ranges. Automatic table detection is a key enabling technique and an initial step in spreadsheet data intelligence. However, the detection task is challenged by the diversity of table structures and table layouts on the spreadsheet. Considering the analogy between a cell matrix as spreadsheet and a pixel matrix as image, and encouraged by the successful application of Convolutional Neural Networks (CNN) in computer vision, we have developed TableSense, a novel end-to-end framework for spreadsheet table detection. First, we devise an effective cell featurization scheme to better leverage the rich information in each cell; second, we develop an enhanced convolutional neural network model for table detection to meet the domain-specific requirement on precise table boundary detection; third, we propose an effective uncertainty metric to guide an active learning based smart sampling algorithm, which enables the efficient build-up of a training dataset with 22,176 tables on 10,220 sheets with broad coverage of diverse table structures and layouts. Our evaluation shows that TableSense is highly effective with 91.3% recall and 86.5% precision in EoB-2 metric, a significant improvement over both the current detection algorithm that are used in commodity spreadsheet tools and state-of-the-art convolutional neural networks in computer vision.
机译:电子表格表检测是检测给定纸张上的所有表格并定位其各自范围的任务。自动表检测是一个关键的启用技术和电子表格数据智能的初始步骤。但是,检测任务受电子表格上表结构的多样性和表格布局的挑战。考虑到单元矩阵作为电子表格和像素矩阵作为图像的类比,并且通过在计算机视觉中成功应用卷积神经网络(CNN)的成功应用,我们开发了Tablessense,这是一种用于电子表格表检测的新端对端框架。首先,我们设计了有效的小区特色方案,以更好地利用每个细胞中的丰富信息;其次,我们开发了一个增强的卷积神经网络模型,用于表检测,以满足精确表边界检测的域特定要求;第三,我们提出了一种有效的不确定性度量来指导基于主动学习的智能采样算法,这使得能够在10,220张纸上有效地积聚,具有22,176张表,具有不同的表结构和布局的广泛覆盖。我们的评估表明,TabileSense在Eob-2度量中的91.3%的召回和86.5%精度,对商品电子表格工具和最先进的卷积神经网络中使用的电流检测算法进行了显着改进计算机视觉。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号