5.11.09

Chapping in the CAPTCHAs

Q: Dear 100HB:

How come on some web sites to leave comments I only have to leave one set of silly characters and then on others I have to type in TWO sets of characters the second of which are OH so very difficult to decipher?

Yours Truly,

Spam the Man

A: Dear Slave to the Anti-Spam:

Could the real slim shady please stand up? No? Well, okay how about you just fill out this verification then?

The answer to your question is because you work for a man named Luis. That is right you are part of the great plan to mobilize the largest workforce in the history of mankind. Not getting paid? Well, you have Luis von Ahn to blame for that. If you are one of the 100 of millions of people who use the Internet (and if you are reading this you are) you have been working for Luis von Ahn for years...for free! So I called him to get some background and answers.

According to Luis, years ago Yahoo had a problem- SPAM. In an effort to gather a vast number of email accounts, Spammers were using automated computer programs to sign up for them. They were right programs to gather millions of Yahoo accounts everyday. Spam was clogging up Yahoo email accounts.

So Luis and his advisor Manuel Blum were approached to identify who was a person and who was an malicious automated computer program. They finally came up with a test. That is where the birth of the verification process called CAPTCHA.

The idea is that a human can discern the characters whereas computers cannot recognize the characters. Suddenly, the spam programs cannot gather accounts. Millions of companies including, YouTube, Gmail, Yahoo, MSN, Facebook, TicketMaster, Flickr, and NBC use CAPTCHA.

You would think with spam being reduced in every one's inbox Luis would be rejoicing at his success. Believe it or not Luis felt bad about this invention of his. When I spoke with him he let me know that he has estimated that for every CAPTCHA entry the average person is waisting 10 seconds of their time.

"If you were to multiply that by 200 million," Luis said, "you get that humanity as a whole is waisting 500,000 hours every day typing these annoying CAPTCHAS." This began to eat at him and he asked himself, "How could I better use this time? Is there a way to use this human effort in a way that benefits humanity?"

"I was on a mission to make good use of that 10 seconds of time," he says.

While hiding from the mass of Internet users upset at Luis's invention he got involved with another project, the initiative to scan books, decipher text and provide books over the Internet. Google has one, the Internet Archive has one but the problem is many books texts are old and faded. Type is not aligned, there are smudges and faded type as well. So when computers scan these old texts the computers don't recognize the text and converts 30-40% of the words incorrectly.

Solution? Take those words and use them as CAPTCHAs and have the people decipher them. But there in lied the problem. The computer could not decipher the word so how is the CAPTCHA test going to verify whether the person typing in the word got it right or not? Luis's solution was to combine the word from a book with a computer generated CAPTCHA.

"We will give two tests," says Luis, "One we know the answer to and the other that we don't and the person can solve the one we know the answer to them we will assume they can solve the one that we don't know the answer to."

He called it RECAPTCHA. Now every time you type a reCAPTCHA you are also transcribing an old book. It looks like this:
Today 125-150 books are being digitized a day because of RECAPTCHA. Even the New York Times archive are being transcribed this way. 130 years of newspaper archive is being transcribed from RECAPTCHA. Luis estimates that the NYT Internet archive will be complete next year with the help of RECAPTCHA.

"Now, we are taking that effort of 10 seconds and applying it to assist in the dissemination of literature, scientific text and social news," Luis states triumphantly.

Luis has done a lot with himself working as faculty at Carnegie Mellon University. But he tries to ensure his work is interesting to others. That is why he has founded GWAP (Games With A Purpose). In fact, I am sure you have all played one of his games, Google Image Labeler- pair up with another Internet user and try to identify matching labels for a picture. This data helps search engines refine search criteria and list more relevant and contextual results. GWAP has made games for the following:
  • Fighting Spam
  • Digitizing Books
  • Labelling Images on the web
So don't get too angry with yourself about those RECAPTCHAs. Just remember you are doing good for humanity. Yet, it begged the question, now that we all decipher two words is that not doubling our efforts to 20 seconds, Luis?

Also, Luis, could you establish a single sign on for the worldwide Internet?

I am sure your comments and questions Luis, are always invited on this Board (thanks for being a good sport).

2 comments:

Anonymous said...

This is great! I don't know whether to sue you or send you a check for thanks :)

There is something wonderful in thinking that the minions of the world are working for me.

I love the site- great idea. So one thing I have always wanted to know about is how microphones are made? Have you ever seen the metal meshing of the microphone head? How do you heat it and weave it without melting it all together?

How does the sound pick up in the microphone and translate it into a recording? So many interesting things about the magic of the mic.

-Luis

Anonymous said...

I got an iphone but have no idea about how to download these files. It seems to be impossible to do it by using the itunes. I got an application that does it from WiFi but I don't even know how to set it up!! My laptop has it but I never used it to connect with something different from hot spot. Do I need to do any set up to make my iphone catch the signal?
[url=http://forexrobot-review.info]best forex software[/url] [url=http://chintangdianguc.com/forum/showthread.php?p=160752#post160752]unlock iphone[/url]

 

100 Hour Board Copyright © 2009 WoodMag is Designed by Ipietoon for Free Blogger Template