首页>
外国专利>
DOCUMENT INFORMATION EXTRACTION METHOD AND SYSTEM BASED ON BODY TEXT IDENTIFICATION
DOCUMENT INFORMATION EXTRACTION METHOD AND SYSTEM BASED ON BODY TEXT IDENTIFICATION
展开▼
机译:基于正文文本识别的文档信息提取方法和系统
展开▼
页面导航
摘要
著录项
相似文献
摘要
A method and a system for extracting document information based on body recognition are provided to recognize a position of a body by sectioning the document and recognizing a section including body information, and correctly extract title information for the document by setting the section for searching a title of the document based on the position of the recognized body section or body. A document interpreter(410) parses the document and a document sectioning part(420) sections the document into each section by referring to parsing information. A body section recognizer(430) recognizes the body section among each section. A detection section setting part(450) sets a title detection section based on the location of the recognized body section. A candidate title selector(460) selects more than one candidate title phrase from the set title detection section. The body section recognizer recognizes the body section of the document according to a rule including at least one of a ratio of texts not having link property of each section, or volume, size or position information of each section.
展开▼