Tesseract OCR - Image Optimization and Stabilization

Wed Jul 20 15:08:57 EDT 2016

Have you seen the Paperless <https://github.com/danielquinn/paperless>
project?
He's using Tesseract OCR as well.  There's another Paperless on github for
MacOSX too

On Tue, Jul 19, 2016 at 5:05 PM, 4kbytes <4kbytes at zoho.com> wrote:

> Hello,
>
> I am currently working with the Tesseract OCR. Tesseract is owned by
> Google with Apache 2.0 licensing.
>
> The issue I am running into is text accuracy.
>
> The current process: target text color to black, background to white, max
> contrast, pass to OCR.
>
> With documents from modern word processors this approach is accurate 98%
> of the time. When trying to read commercial serials or ID's, which are can
> be very compact, the result is accurate in count but not characters.
>
> Has anyone worked with this system before and know a possible solution? I
> am currently looking into ImageMagick.
>
> _______________________________________________
> gnhlug-discuss mailing list
> gnhlug-discuss at mail.gnhlug.org
> http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.gnhlug.org/pipermail/gnhlug-discuss/attachments/20160720/a2ebe99b/attachment.html