首页> 外文会议>2012 International Conference on Information Society >A domain knowledge-based approach for automatic correction of printed invoices
【24h】

A domain knowledge-based approach for automatic correction of printed invoices

机译:一种基于领域知识的方法,可自动更正打印的发票

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Although OCR technology is now commonplace, character recognition errors are still a problem, in particular, in automated systems for information extraction from printed documents. This paper proposes a method for the automatic detection and correction of OCR errors in an information extraction system. Our algorithm uses domain-knowledge about possible misrecognition of characters to propose corrections; then it exploits knowledge about the type of the extracted information to perform syntactic and semantic checks in order to validate the proposed corrections. We assess our proposal on a real-world, highly challenging dataset composed of nearly 800 values extracted from approximately 100 commercial invoices and we obtained very good results.
机译:尽管OCR技术现在很普遍,但是字符识别错误仍然是一个问题,特别是在用于从打印文档中提取信息的自动化系统中。本文提出了一种在信息提取系统中自动检测和纠正OCR错误的方法。我们的算法使用有关可能误识别字符的域知识来提出更正;然后,它利用有关提取信息类型的知识来执行语法和语义检查,以验证建议的更正。我们在现实世界中极富挑战性的数据集上评估了我们的建议,该数据集由从大约100张商业发票中提取的近800个值组成,并获得了很好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号