PWNtcha stands for "Pretend We’re Not a Turing Computer but a Human Antagonist". The site has code that will defeat many common Captcha systems, but will not release it. For obvious reasons. This is academic research, but spammers would be in heaven. Here's a quote from a link on the site
Quote:
|
this article is about writing a comment spam bot. it ended up posting 94 comment messages to CAPTCHA protected blog pages in 10 minutes. all it does is visit a blog post and download the associated CAPTCHA image. then it uses some image processing techniques to parse out the characters in the image. each character is then run through some AI processing to figure out what letter the character image represents. finally, with the result, it posts the comment spam to the blog engine. i wrote it for a couple of reasons ... mainly to show that rel= 'nofollow' and CAPTCHA are false protection from comment spam.
|
That's not the only way to do it, and it sounds like this is the hard way. I think that AI spam bot has to be programmed for every type of Captcha it can break. But there's an easy way even I could do.
Create Table CaptchaBuster ( ImageUrl varChar(400), ImageHash varChar(400), FormData Text, CaptchaAnswer varChar(100) )
Now if you have a naughty robot, when it gets challenged, it has an easy process. Download the image, hash the file, and go look up the answer in the database. If that file doesn't exist, add it to the database, and alert a human to answer it. That human can go to the comment page ( or wiki add a "relevant" external link or whatever) and hit refresh a few times in case any more new ones show up. You just cache the answers in your database.
Learning Newbie Is A Spammer??!?
There's a process. - Spam.
- Good guys make captchas to stop spam.
- Spammers laugh at captchas. I'm playing the role of a bad guy to point out the weakness.
- We address that flaw and come up with something more secure.
So what makes this stuff weak and easy to break, then? What needs to be fixed? - They're easy to read. OCR software knows how to make text out of text that's in a gif file or a fax or whatever. Problem is, the harder you make it for a computer to break, the worse it looks and the harder for a real person. Don't want to drive humans away with the robots, so this is a last resort. But anything that's too easy, someone already made software to break it.
- Not enough variation. Lots of sites have like 3 differnet captchas, and a spammer can fill his database in minutes. It takes time to make new captcha images, so nobody really wants to, but this makes a spammer's job easy.
- One system. We've seen lots of different kinds of captchas, each one has its own style with fonts and colors and stuff in the background - but most sites pick one and go with it. Since people might do AI or they might do a database or even some other approach I didn't think of, you're safer if you change the locks sometimes.
But even making these kinds of changes, Captchas are still broken. Maybe instead of throwing rocks at the end user, what web systems programmers really need to do is make things more secure on the server.
|