首页> 外国专利> Table Header Detection Using Global Machine Learning Features from Orthogonal Rows and Columns

Table Header Detection Using Global Machine Learning Features from Orthogonal Rows and Columns

机译:使用全局机器学习功能从正交行和列中进行表头检测

摘要

A method, system and computer-usable medium for detecting headers in various documents, such as PDF and HTML files. The files are converted to a two dimensional array or table, having orthogonal rows and columns. Either rows or columns are determined to include headers. For determining if rows include headers. For each row in the array or table, pair wise comparison is performed for each cell of each column that is orthogonal to that row. The pair wise comparison scores or values are summed up for each orthogonal column to that row and the sum across for all the orthogonal columns to row provide a score or value for that row. Row scores are evaluated relative to one another to determine likelihood of headers in the row. For determining if columns have headers, similar calculation is performed between columns and their orthogonal rows.
机译:一种用于检测诸如PDF和HTML文件之类的各种文档中的标题的方法,系统和计算机可用介质。文件被转换为具有正交行和列的二维数组或表。行或列被确定为包含标题。用于确定行是否包含标题。对于数组或表中的每一行,将对与该行正交的每一列的每个单元格执行成对比较。对于该行的每个正交列,将成对比较得分或值相加,并且到该行的所有正交列的相加之和提供该行的得分或值。相对彼此评估行分数,以确定行中标头的可能性。为了确定列是否具有标题,在列及其正交行之间执行类似的计算。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号