Tycoon Talk
Become a Big fish!
The number 1 forum for online business!
Post topics, ask questions, share your knowledge.
Tycoon Talk is part of Freelancer.com - find skilled workers online at a fraction of the cost.

The Google Forum


You are currently viewing our The Google Forum as a guest. Please register to participate.
Login



Reply
Madlib spam questions.
Old 12-12-2008, 06:22 PM Madlib spam questions.
Learning Newbie's Avatar
Defies a Status

Latest Blog Post:
Astounding Republican Paranoia
Posts: 5,662
Name: John Alexander
Trades: 0
First, how often is this caught? If I use hidden links (I don't mean stuff you can turn on and off for convenience, but hidden stuff that's never turned on), I can expect to be caught, and probably sooner than later. Is that basically true for sites that use madlib spam? This is what I'm talking about

Quote:
De la mortgage in visual frames the show dashboard option viagra middle assertion of select the freight Harrison drive. On test smiley cat server fry dork season's greetings text clock.
Hopefully this is quickly spotted, and the web site is banned.

Are there any thoughts on how this might be caught? Human review? Markov chain style? Google certainly has the data to pull that off.

If they're able to detect madlib spam, I'm guessing it's not through Markov. I'm getting more than 70 thousand results for "colorless green ideas" - and the probability of seeing 'green' immediately after 'colorless' is nil. Or, could it be that enough web sites have published this particular phrase, that it's probability is far enough from zero, to not raise flags?

Are there other algorithms I don't know of, that might trap this? Don't say bigrams and trigrams - they don't work for madlib spam. There are an infinte # of possibilities for randomly generated text, with both grammatical + sensical, and also spamtacular nonsense, being born every day. It's entirely possible (and overwhelmingly likely) to find good or bad text that's never occured before, making probability based models, well, not work.

I'm not planning to go out and proliferate spam, but I'd like to have a bette understandinf for how this all works.
__________________

Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE
Learning Newbie is offline
Reply With Quote
View Public Profile
 
 
Register now for full access!
Old 12-12-2008, 06:58 PM Re: Madlib spam questions.
VirtuosiMedia's Avatar
Web Design Made Simple

Posts: 1,228
Trades: 0
RDF might help combat something like this eventually...maybe. I don't know, if I were Google, I'd be using AdSense and/or Analytics to help get rid of this type of thing. If they know how long users remain on a page, lower times might be a good signal that the page isn't worth being at the top of the rankings. I'm not saying that's the only factor, but it would be smart, IMO.
__________________
Want new web resources every day? - Follow me on
Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE
|
Please login or register to view this content. Registration is FREE
VirtuosiMedia is offline
Reply With Quote
View Public Profile Visit VirtuosiMedia's homepage!
 
Old 12-15-2008, 02:09 PM Re: Madlib spam questions.
Learning Newbie's Avatar
Defies a Status

Latest Blog Post:
Astounding Republican Paranoia
Posts: 5,662
Name: John Alexander
Trades: 0
Yeah, considering the cost to build out those tools and collect data through them, I'm sure Google is leveraging them to some extent.

I guess a better question is how a programmer might go about building software that could find this type of spam - lots of random words being puked out? I don't have terabytes to download Google's n grams, but on top of that, given that language makes infinite use out of finite resources, and that new phrases are born every moment, it doesn't seem like a plausible route. And from what I gather, we're a very long way off from making computers understand language well enough to understand that these words don't go together in any sort of meaningful way. Finally, it seems like Bayesian style math doesn't apply here, since this arose to beat Bayesian spam filters by using words (at random) that aren't associated with spam.
__________________

Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE
Learning Newbie is offline
Reply With Quote
View Public Profile
 
Old 12-15-2008, 03:03 PM Re: Madlib spam questions.
VirtuosiMedia's Avatar
Web Design Made Simple

Posts: 1,228
Trades: 0
You might be able to construct some sort of grammar-based filter in addition to a Bayesian filter. Well-formed English must have prepositions and articles and the number of possibilities is a fairly limited list. You could compose a filter that says a message must have a certain percentage of prepositions and articles in comparison to the total number of words. You might even be able to use something similar to the Bayesian formula, though probably inverted because you're looking for the words' existence in a positive sense rather than the negative of spam stop words.
__________________
Want new web resources every day? - Follow me on
Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE
|
Please login or register to view this content. Registration is FREE
VirtuosiMedia is offline
Reply With Quote
View Public Profile Visit VirtuosiMedia's homepage!
 
Reply     « Reply to Madlib spam questions.
 

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off





   
RSS Feed  Feeds: RSS   JS   XML
RSS Feed  Feeds for this forum: RSS   JS   XML



Page generated in 0.13696 seconds with 12 queries