Michael's Blog: How to let tesseract (OCR) to only recognize "Digits" ?

Monday, May 3, 2010

How to let tesseract (OCR) to only recognize "Digits" ?

Recently, I am playing around the OCR(Optical character recognition)
http://en.wikipedia.org/wiki/Optical_character_recognition

tesseract is a good open source that I play first, but it lacks documents, a little bit annoying. Some people asked how to let tesseract to only recognize "Digits", you may find some hints at FAQ of tesseract's wiki or README, but I shared what I found.

Environment : Ubuntu 8.04
1.Add a file (digit) to /usr/local/share/tessdata/configs/
2."a file(digit)"
filename : digit
file content : tessedit_char_whitelist 0123456789
3.Change your tesseract command as below
Ex:
./tesseract ~/image.tif ~/output nobatch digits

Have Fun, honestly, not so good to recognize "DIGITS" as I thought

5 comments:

AnonymousMay 25, 2010 at 2:23 AM
Hey , I m webmaster From Yantram BPO Pvt Ltd. I like your Blog Information its
Truly Good and Informative As Well. We Also Provide Data Entry Services If you want to Discuss anything about Data Entry then you can Contact us On This Website

http://data-entry.outsourcing-services-india.com
ReplyDelete
Replies
BasiaBernsteinDecember 17, 2010 at 7:22 AM
What is the "best" font for text recongition by tesseract? I can choose how I print these out, but I'm finding that some fonts work better than others.pearson correlation
ReplyDelete
Replies
GadelatJune 12, 2011 at 5:31 PM
I like it! Thanks for publishing.
ReplyDelete
Replies
laolangAugust 3, 2012 at 12:18 PM
what about training the digits to a certain font to improvement accuracy to that font?
ReplyDelete
Replies
UnknownOctober 20, 2012 at 12:27 AM
Its so helpfull..
Thanks.
ReplyDelete
Replies

Add comment

Michael's Blog

Monday, May 3, 2010

How to let tesseract (OCR) to only recognize "Digits" ?

5 comments:

About Me

Blog Archive