【24h】

Parameter-Free Table Detection Method

机译:无参数表检测方法

获取原文

摘要

In this paper, we propose two parameter-free table detection methods: one for the closed tables and other for open tables. The unifying idea is multigaussian analysis. Multigaussian analysis of text height histograms classifies the document content into text and non-text blocks. Closed tables are classified as non-text and their identification from the non-text blocks is similar to many earlier methods that remove the separators. We do not need any parameters to identify rows and columns and discriminate them from text blocks because of multigaussian analysis. Open tables are initially classified as text blocks and are detected by extending the multigaussian analysis to the heights and widths of text blocks. The text-blocks are grouped into three categories by multigaussian analysis. These groups are used to classify table cells and distinguish them from text blocks. Table blocks are merged to obtain the table region. Evaluation on various Indic script newspapers and ICDAR2013 table competition dataset shows that our methods achieve more than 90% in table recognition. The strength of our algorithm is that it is a parameter-free approach and requires no training dataset.
机译:在本文中,我们提出了两种无参数表检测方法:一种用于封闭表,另一种用于开放表。统一的思想是多高斯分析。文本高度直方图的多高斯分析将文档内容分为文本块和非文本块。封闭表被分类为非文本,它们在非文本块中的标识与许多早期的删除分隔符的方法相似。由于多高斯分析,我们不需要任何参数来标识行和列并将它们与文本块区分开。打开的表最初被分类为文本块,并通过将多高斯分析扩展到文本块的高度和宽度进行检测。通过多高斯分析将文本块分为三类。这些组用于对表格单元进行分类,并将其与文本块区分开。合并表块以获得表区域。对各种印度文字报纸和ICDAR2013桌子比赛数据集的评估表明,我们的方法在桌子识别方面达到了90%以上。我们算法的优势在于它是一种无参数方法,不需要训练数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号