首页> 外文会议>IAPR International Conference on Document Analysis and Recognition >CloudScan - A Configuration-Free Invoice Analysis System Using Recurrent Neural Networks
【24h】

CloudScan - A Configuration-Free Invoice Analysis System Using Recurrent Neural Networks

机译:CloudScan-使用循环神经网络的无配置发票分析系统

获取原文

摘要

We present CloudScan; an invoice analysis system that requires zero configuration or upfront annotation. In contrast to previous work, CloudScan does not rely on templates of invoice layout, instead it learns a single global model of invoices that naturally generalizes to unseen invoice layouts. The model is trained using data automatically extracted from end-user provided feedback. This automatic training data extraction removes the requirement for users to annotate the data precisely. We describe a recurrent neural network model that can capture long range context and compare it to a baseline logistic regression model corresponding to the current CloudScan production system. We train and evaluate the system on 8 important fields using a dataset of 326,471 invoices. The recurrent neural network and baseline model achieve 0.891 and 0.887 average F1 scores respectively on seen invoice layouts. For the harder task of unseen invoice layouts, the recurrent neural network model outperforms the baseline with 0.840 average F1 compared to 0.788.
机译:我们介绍CloudScan;需要零配置或预先批注的发票分析系统。与以前的工作相比,CloudScan不依赖发票布局模板,而是学习一个发票全局模型,自然而然地将其推广到看不见的发票布局。使用从最终用户提供的反馈中自动提取的数据来训练模型。这种自动训练数据提取消除了用户精确注释数据的要求。我们描述了一种可捕获远程上下文的递归神经网络模型,并将其与对应于当前CloudScan生产系统的基线逻辑回归模型进行比较。我们使用326,471张发票的数据集在8个重要领域上对系统进行了训练和评估。循环神经网络和基线模型在可见的发票布局上分别获得0.891和0.887的平均F1分数。对于看不见的发票布局的艰巨任务,递归神经网络模型以0.840的平均F1优于0.788的平均F1胜过基线。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号