Tesseract OCR - Image Optimization and Stabilization

Tue Jul 19 17:05:06 EDT 2016

Hello,

I am currently working with the Tesseract OCR. Tesseract is owned by Google with Apache 2.0 licensing.

The issue I am running into is text accuracy. 

The current process: target text color to black, background to white, max contrast, pass to OCR. 

With documents from modern word processors this approach is accurate 98% of the time. When trying to read commercial serials or ID's, which are can be very compact, the result is accurate in count but not characters.

Has anyone worked with this system before and know a possible solution? I am currently looking into ImageMagick.