Digitising books one word at a time

You know those annoying CAPTCHA codes – You know, the ones we have on our forms on this site and no doubt you’ve seen on many other sites? Well I only just found out why they were created.

No, i’m not talking about the obvious reason to stop spam -of which they seem to do a good job, but actually what their greater purpose is for. I’ll quote from the Google reCAPTCHA page.

To archive human knowledge and to make information more accessible to the world, multiple projects are currently digitizing physical books that were written before the computer age. The book pages are being photographically scanned, and then transformed into text using “Optical Character Recognition” (OCR). The transformation into text is useful because scanning a book produces images, which are difficult to store on small devices, expensive to download, and cannot be searched. The problem is that OCR is not perfect.

reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. More specifically, each word that cannot be read correctly by OCR is placed on an image and used as a CAPTCHA. This is possible because most OCR programs alert you when a word cannot be read correctly.

But if a computer can’t read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here’s how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.

That, I have to say is brilliant. I find them frustrating as normally one word seems so hard to decipher and as a website owner I want users to be able to sign up as easily as possible. I don’t want unnecessary barriers in their way. But now I understand more about what CAPTCHA is doing I feel happier in using this barrier. It stops those pesky robots spamming my site and it helps digitise books one word at a time.

Over to you

Does this explanation make you less annoyed when you have to decipher CAPTCHA’s? Of are you just as frustrated as ever? Let me know what you think.

Find this post useful? Subscribe to our Curious newsletter.
Stay up to date with unmissable doses of inspiration straight to your inbox.
, , ,
Richard Lalchan

Richard Lalchan is founder of Creatives Hub whose mission is to help as many creatives as possible get rid of the shackles of procrastination, break out of fear, grow in confidence and get stuff done. He also works with individuals and businesses to build their web presence, runs a podcast network and is currently writing his first sci-fi novella.

Similar Posts
Latest Posts from Creatives Hub