A system and method to error correct extant electronic documents is disclosed. An electronic document may be rasterized to obtain a pixel representation of the electronic document (e.g., raster image). One or more optical character recognition (OCR) tasks may be performed on the raster image of the electronic document. Errors discovered by the OCR tasks may be corrected and a customized error corrected version of the electronic document may be created and stored. If the author of the electronic document is known, the raster image may be compared to a personalized tf*idf error dictionary associated with the author to determine known OCR errors specific to the author. The raster image may also be compared to a personalized electronic error dictionary associated with the author to determine known typographical errors specific to the author.
展开▼