Tuesday, January 31, 2012

CAPTCHA - A Revolution

CAPTCHA = Completely Automated Public Turing test to tell Computers and Humans Apart

What is a CAPTCHA?
A System built by Luis von Ahn, Manuel Blum, Nicholas J. Hopper, and John Langford of CMU to make sure that user who is active at the other end is a Human and not a bot. This was initially done to prevent bots entering yahoo chat rooms and redirecting the users to someother sites.

CAPTCHA - Reverse Turing Test:
Yups, CAPTCHA is a reverse turing test because it reverses the role of computers and human. Computer is a device designed to perform what human want it to. But in the case of CAPTCHA it is reversed. It is completely automated, so computer challenges you to perform some action to identify that you are a human.

Initially [even now] CAPTCHA was an distorted image with some characters in it which would make lives of bots harder to detect them but which wouldn't affect human though

Next generation of CAPTCHA carried a audio link with the distorted image beside to help visually challenged people

Although CAPTCHA are automatically generated they are easily breakable using some techniques like OCR(Optical Character Recognition) or by understanding the underlying logic of automation.

And Now, people started their own implementation including

Mathemetical Captcha => What is 1 + 1?
Image/Visual Captcha => Who is alice in the photo tagged with friends? [FB uses it to detect legitimate user of an account]
and so on

But the real master piece is reCaptcha [Powered by Google]

What is great in that?
It is great because it knows the value of human time. A test that unites human power :)
If you had noticed any recaptcha there will be two space separated words
Consider the image shown for example [said allectst]
Where does this words come from?
These words come from the process of digitizing old text with OCR
Inorder to generate digital version [ex: pdf] of a book which was written way back digitized books or word processing tools came in to existence, people use a technology which scans the book and takes a photocopy[image] of it. Then it tries to recognize the characters using image processing technique called OCR and digitizes the old text.
What it has to do with reCaptcha?
OCR is an automated tool to recognize characters from an image. It is not guaranteed that it will be able to recognize all characters with out any discrepancies. For ex. T can be interpreted as I based on some fonts or clarity of the image.

So, what recaptcha people do is
Pick two words; one was successfully recognized by ocr, said and the other it wasn't able to, allecstst. 
Challenge the user for CAPTCHA test.
If the user answers the one successfully recognized by ocr [said] correctly, it will confirm that the user is a human. And the other word is kind of a poll. The same unrecognized word will be shown to a group of people [say 10].
If out of 10, 7 [i.e., majority] were able to recognize allecstst as allecstst and the rest understood it as alleestst, then the unrecognized word is considered as allecstst as majority falls for it. Hence a word is digitized in a book :)

So, without your knowledge you are helping digitize a book whenever you fill a recaptcha :) Be happy whenever you answer a recaptcha and proud to be united :)
A book is being digitized whenever a user signs in to Facebook, gmail, linked in, etc.,

From the site
About 200 million CAPTCHAs are solved by humans around the world every day. In each case, roughly ten seconds of human time are being spent. Individually, that's not a lot of time, but in aggregate these little puzzles consume more than 150,000 hours of work each day. What if we could make positive use of this human effort? reCAPTCHA does exactly that by channeling the effort spent solving CAPTCHAs online into "reading" books.

visit this site to learn more and feel great :)