首页> 外国专利> Redigitization System and Service

Redigitization System and Service

机译:再数字化系统和服务

摘要

A system and method to error correct extant electronic documents is disclosed. An electronic document may be rasterized to obtain a pixel representation of the electronic document (e.g., raster image). One or more optical character recognition (OCR) tasks may be performed on the raster image of the electronic document. Errors discovered by the OCR tasks may be corrected and a customized error corrected version of the electronic document may be created and stored. If the author of the electronic document is known, the raster image may be compared to a personalized tf*idf error dictionary associated with the author to determine known OCR errors specific to the author. The raster image may also be compared to a personalized electronic error dictionary associated with the author to determine known typographical errors specific to the author.
机译:公开了一种对现有电子文档进行纠错的系统和方法。电子文档可以被光栅化以获得电子文档的像素表示(例如,光栅图像)。可以在电子文档的光栅图像上执行一项或多项光学字符识别(OCR)任务。可以纠正由OCR任务发现的错误,并可以创建和存储电子文档的定制错误纠正版本。如果电子文档的作者是已知的,则可以将光栅图像与与作者相关联的个性化tf * idf错误字典进行比较,以确定特定于作者的已知OCR错误。光栅图像也可以与与作者相关联的个性化电子错误词典进行比较,以确定特定于作者的已知印刷错误。

著录项

  • 公开/公告号US2015049949A1

    专利类型

  • 公开/公告日2015-02-19

    原文格式PDF

  • 申请/专利权人 STEVEN J SIMSKE;SAMSON J. LIU;

    申请/专利号US201214364743

  • 发明设计人 STEVEN J SIMSKE;SAMSON J. LIU;

    申请日2012-04-29

  • 分类号G06K9/18;G06K9/00;

  • 国家 US

  • 入库时间 2022-08-21 15:23:58

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号