首页> 外国专利> EXTRACTING CONTENT FROM AS DOCUMENT USING VISUAL INFORMATION

EXTRACTING CONTENT FROM AS DOCUMENT USING VISUAL INFORMATION

机译：使用视觉信息从作为文档中提取内容

页面导航

摘要
著录项
相似文献

摘要

An aspect of the present invention discloses a method for extracting content from a document. The method includes one or more processors identifying a visual anchor corresponding to a text element depicted in a first document utilizing an edge detection analysis. The method further includes determining edge coordinates of the text element depicted in the first document. The method further includes determining text at a leading edge of the text element depicted in the first document and text at a trailing edge of the text element depicted in the first document, based on the determined edge coordinates. The method further includes extracting a complete version of the text element depicted in the first document, from a plain text version of the first document, utilizing the determined text at the leading edge of the text element and the determined text at the trailing edge of the text element.

机译：本发明的一个方面公开了一种从文档中提取内容的方法。该方法包括一个或多个处理器，其识别与利用边缘检测分析的第一文档中描绘的文本元素对应的视觉锚。该方法还包括确定第一文档中描绘的文本元素的边缘坐标。该方法还包括基于所确定的边缘坐标确定在第一文档中描绘的第一文档中描绘的文本元件的前沿的文本，并且基于所确定的边缘坐标。该方法还包括从第一文档的纯文本版本中提取第一文档中描绘的文本元素的完整版本，利用文本元素的前沿和所确定的文本处的所确定的文本。文本元素。

著录项

公开/公告号US2022012421A1

专利类型
公开/公告日2022-01-13

原文格式PDF
申请/专利权人 INTERNATIONAL BUSINESS MACHINES CORPORATION;
展开▼

申请/专利号US202016927512
发明设计人 ZHONG FANG YUAN;ZHUO CAI;TONG LIU;YU PAN;XIANG YU YANG;DONG QIN;
展开▼

申请日2020-07-13
分类号G06F40/205;G06F40/151;
国家 US
入库时间 2022-08-24 23:20:35

相似文献

专利
外文文献
中文文献