首页> 外文会议>IAPR International Conference on Document Analysis and Recognition >Table Recognition in Heterogeneous Documents using Machine Learning
【24h】

Table Recognition in Heterogeneous Documents using Machine Learning

机译:表识别在使用机器学习的异构文件中

获取原文

摘要

Tables are an easy way to represent information in a structural form. Table recognition is important for the extraction of such information from document images. Usually, modern OCR systems provide textual information coming from tables without recognizing actual table structure. However, recognition of table structure is important to get the contextual meaning of the contents. Table structure recognition in heterogeneous documents is challenging due to a variety of table layouts. It becomes harder where no physical rulings are present in a table. This work proposes a novel learning based methodology for the recognition of table contents in heterogeneous document images. Textual contents of documents are classified as table or non-table elements using a pre-trained neural network model. The output of the neural network is further enhanced by applying a contextual post processing on each element to correct the classifications errors if any. The system is trained using a subset of UNLV and UW3 document images and depicted more than 97% accuracy on a test set in detection of table and non-table elements.
机译:表是以结构形式表示信息的简单方法。表识别对于从文档图像提取此类信息非常重要。通常,现代OCR系统提供来自表的文本信息,而无需识别实际的表结构。但是,识别表结构对于获得内容的上下文含义非常重要。表结构识别在异构文件中由于各种表布局而挑战。在桌子中没有物理裁决,它变得更加困难。这项工作提出了一种基于学习的基于学习的方法,用于在异构文档图像中识别表内容。使用预先训练的神经网络模型,文本的文档内容被分类为表或非表元素。通过在每个元素上应用上下文后处理来进一步增强神经网络的输出,以纠正分类错误(如果有)。系统使用UNLV和UW3文档图像的子集接受培训,并在检测表和非表元素检测中描绘了超过97%的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号