OCR-Text Error Detection and Correction

Md Rashadul Hasan Rakib
Dr. Aminul Islam
Dr. Evangelos Milios

Errors in text are occurred naturally. We are working with the errors of the texts generated by theĀ Optical Character Recognition (OCR) technique from the Biodiversity Heritage Library. Principally there are two stages in our work. In the first stage, we detect errors within the texts using the Google-n-gram and Google Book Ngram Viewer corpus. In addition, we use a domain specific lexicon. In the second phase, We use the same resources to correct errors in the texts.