首页>
外国专利>
DOCUMENT STRUCTURE EXTRACTING DEVICE AND DOCUMENT STRUCTURE INFORMATION EXTRACTING METHOD
DOCUMENT STRUCTURE EXTRACTING DEVICE AND DOCUMENT STRUCTURE INFORMATION EXTRACTING METHOD
展开▼
机译:文档结构提取设备和文档结构信息提取方法
展开▼
页面导航
摘要
著录项
相似文献
摘要
PROBLEM TO BE SOLVED: To provide a document structure extracting device capable of extracting a document structure from an electronic document without using a dictionary. SOLUTION: Concerning the document structure extracting device for extracting document structure information from the electronic document, this device is provided with a character information generating part 103 for generating character information containing information on the position, character size and character type of each character by analyzing the document, a line information generating part 105 for generating line information containing information on the character string of each line, the main character size and main character type of each line and the score of each line by analyzing the character information and a document information generating part 107 for generating the document structure information by analyzing this line information. The document structure information generating part 107 generates the document structure information by grouping the lines on the basis of the score of the line information and the continuity of lines. Thus, the document structure information can be extracted from the electronic document without using the dictionary.
展开▼