首页> 外文会议>IAPR International Workshop on Document Analysis Systems >Table Recognition in Spreadsheets via a Graph Representation
【24h】

Table Recognition in Spreadsheets via a Graph Representation

机译:通过图形表示法在电子表格中进行表格识别

获取原文

摘要

Spreadsheet software are very popular data management tools. Their ease of use and abundant functionalities equip novices and professionals alike with the means to generate, transform, analyze, and visualize data. As a result, spreadsheets are a great resource of factual and structured information. This accentuates the need to automatically understand and extract their contents. In this paper, we present a novel approach for recognizing tables in spreadsheets. Having inferred the layout role of the individual cells, we build layout regions. We encode the spatial interrelations between these regions using a graph representation. Based on this, we propose Remove and Conquer (RAC), an algorithm for table recognition that implements a list of carefully curated rules. An extensive experimental evaluation shows that our approach is viable. We achieve significant accuracy in a dataset of real spreadsheets from various domains.
机译:电子表格软件是非常流行的数据管理工具。它们的易用性和丰富的功能为新手和专业人员提供了生成,转换,分析和可视化数据的方式。因此,电子表格是事实和结构化信息的重要资源。这强调了自动理解和提取其内容的需求。在本文中,我们提出了一种识别电子表格中表格的新颖方法。推断出各个单元的布局作用后,我们构建了布局区域。我们使用图形表示对这些区域之间的空间相互关系进行编码。基于此,我们提出了“删除并征服”(RAC),这是一种用于表识别的算法,可实现一系列精心策划的规则。广泛的实验评估表明,我们的方法是可行的。我们在来自各个领域的真实电子表格的数据集中实现了显着的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号