OCR Error Correction

Jie Mei
Aminul Islam
Abidalrahman Moh’d
Dr. Evangelos Milios

The accuracy of Optical Character Recognition (OCR) is crucial to the success of subsequent applications used in text analyzing pipeline. Recent models of OCR
post-processing significantly improve the quality of OCR-generated text, but are
still prone to suggest correction candidates from limited observations while insufficiently accounting for the characteristics of OCR errors. We investigate how to enlarge candidate suggestion space by using external corpus and integrating
OCR-specific features in a regression approach to correct OCR-generated errors.