首页>
外国专利>
MACHINE LEARNING BASED END-TO-END EXTRACTION OF TABLES FROM ELECTRONIC DOCUMENTS
MACHINE LEARNING BASED END-TO-END EXTRACTION OF TABLES FROM ELECTRONIC DOCUMENTS
展开▼
机译:基于机器学习的电子文档的端到端提取表
展开▼
页面导航
摘要
著录项
相似文献
摘要
In some embodiments, a method includes identifying a set of word bounding boxes in a first electronic document, and identifying locations of horizontal white space between two adjacent rows from a set of rows in a table. The method includes determining, using a Natural Language Processing algorithm, an entity name from a set of entity names for each table cell from a set of table cells in the table. The method includes determining, using a machine learning algorithm a class from a set of classes for each row from the set of rows. The method includes extracting a set of table cell values associated with the set of table cells, and generating a second electronic document including the set of table cell values arranged in the set of rows and the set of columns such that the set of words in the table are computer-readable in the second electronic document.
展开▼