首页> 外文会议>IAPR International Conference on Document Analysis and Recognition >CloudScan - A Configuration-Free Invoice Analysis System Using Recurrent Neural Networks
【24h】

CloudScan - A Configuration-Free Invoice Analysis System Using Recurrent Neural Networks

机译:CloudScan - 使用经常性神经网络的无配置发票分析系统

获取原文

摘要

We present CloudScan; an invoice analysis system that requires zero configuration or upfront annotation. In contrast to previous work, CloudScan does not rely on templates of invoice layout, instead it learns a single global model of invoices that naturally generalizes to unseen invoice layouts. The model is trained using data automatically extracted from end-user provided feedback. This automatic training data extraction removes the requirement for users to annotate the data precisely. We describe a recurrent neural network model that can capture long range context and compare it to a baseline logistic regression model corresponding to the current CloudScan production system. We train and evaluate the system on 8 important fields using a dataset of 326,471 invoices. The recurrent neural network and baseline model achieve 0.891 and 0.887 average F1 scores respectively on seen invoice layouts. For the harder task of unseen invoice layouts, the recurrent neural network model outperforms the baseline with 0.840 average F1 compared to 0.788.
机译:我们呈现CloudScan;一个需要零配置或提升注释的发票分析系统。与以前的工作相比,CloudScan不依赖发票布局的模板,而是学习单一的全球发票模型,自然地推广到看不见的发票布局。使用从最终用户自动提取的数据训练该模型提供的反馈。此自动训练数据提取消除了用户精确注释数据的要求。我们描述了一种经常性的神经网络模型,可以捕获长距离上下文,并将其与当前CloudScan生产系统相对应的基线逻辑回归模型进行比较。我们使用326,471发票的数据集训练并评估8个重要领域的系统。经常性的神经网络和基线模型分别达到0.891和0.887分别在被看见的发票布局上的平均F1分数。对于看不见的发票布局的困难任务,经常性神经网络模型优于基线,平均平均F1与0.788相比。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号