首页>
外国专利>
DETECTING THE BOUNDS OF BORDERLESS TABLES IN FIXED-FORMAT STRUCTURED DOCUMENTS USING MACHINE LEARNING
DETECTING THE BOUNDS OF BORDERLESS TABLES IN FIXED-FORMAT STRUCTURED DOCUMENTS USING MACHINE LEARNING
展开▼
机译:使用机器学习检测固定格式结构化文档中的无边界表的边界
展开▼
页面导航
摘要
著录项
相似文献
摘要
Techniques are disclosed for detecting the bounds of borderless open tables in fixed-format structured documents, such as PDF documents, and grouping text lines into predicted borderless tables. The target document comprises a set of text lines each having a respective vertical and horizontal position in the target document. A sorted list of the text lines is generated based upon a vertical and horizontal position of each text line in the target document. For each text line in the sorted list, a respective probability that the text line in the sorted list belongs to a borderless table is then determined. According to one embodiment, the probability may be determined using a classifier that may employ a logistic regression algorithm.
展开▼