首页> 外国专利> METHOD FOR EXTRACTING FORM INFORMATION IN A STRUCTURED MANNER, ELECTRONIC DEVICE, AND COMPUTER-READABLE STORAGE MEDIUM

METHOD FOR EXTRACTING FORM INFORMATION IN A STRUCTURED MANNER, ELECTRONIC DEVICE, AND COMPUTER-READABLE STORAGE MEDIUM

机译:在结构化方式,电子设备和计算机可读存储介质中提取表单信息的方法

摘要

A method for extracting form information in a structured manner. The method comprises the following steps: acquiring position information and label information about each row of characters in a specified document (such as a PDF document) (S31); according to the position information and label information about each row of characters, recognizing a line wrap situation and a page-crossing situation from a form of the specified document (S32); when a line wrap situation is recognized from the form of the specified document, storing information in the form in rows and in columns according to a first reconstruction rule (S33); and when a page-crossing situation is recognized from the form of the specified document, then storing information in the form in rows and in columns according to a second reconstruction rule (S34). By means of the method, data can be extracted and stored in a structured manner.
机译:一种以结构化方式提取表单信息的方法。该方法包括以下步骤:获取关于指定文档(例如PDF文档)中每行字符的位置信息和标签信息(S31);以及根据关于每行字符的位置信息和标签信息,从指定文档的形式中识别换行情况和跨页情况(S32);当从指定文档的形式识别出换行情况时,根据第一重构规则以行和列形式存储信息(S33);当从指定文档的形式识别出跨页情况时,则根据第二重构规则以行和列形式存储信息(S34)。通过该方法,可以以结构化的方式提取和存储数据。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号