<div dir="ltr">Have you seen the <a href="https://github.com/danielquinn/paperless">Paperless</a> project? He's using Tesseract OCR as well. There's another Paperless on github for MacOSX too</div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Jul 19, 2016 at 5:05 PM, 4kbytes <span dir="ltr"><<a href="mailto:4kbytes@zoho.com" target="_blank">4kbytes@zoho.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello,<br>
<br>
I am currently working with the Tesseract OCR. Tesseract is owned by Google with Apache 2.0 licensing.<br>
<br>
The issue I am running into is text accuracy.<br>
<br>
The current process: target text color to black, background to white, max contrast, pass to OCR.<br>
<br>
With documents from modern word processors this approach is accurate 98% of the time. When trying to read commercial serials or ID's, which are can be very compact, the result is accurate in count but not characters.<br>
<br>
Has anyone worked with this system before and know a possible solution? I am currently looking into ImageMagick.<br>
<br>
_______________________________________________<br>
gnhlug-discuss mailing list<br>
<a href="mailto:gnhlug-discuss@mail.gnhlug.org">gnhlug-discuss@mail.gnhlug.org</a><br>
<a href="http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/" rel="noreferrer" target="_blank">http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/</a><br>
</blockquote></div><br></div>