Tycoon Talk
Become a Big fish!
The number 1 forum for online business!
Post topics, ask questions, share your knowledge.
Tycoon Talk is part of Freelancer.com - find skilled workers online at a fraction of the cost.

The Google Forum


You are currently viewing our The Google Forum as a guest. Please register to participate.
Login



Reply
How does Google uniquify duplicate content?
Old 06-18-2008, 06:06 PM How does Google uniquify duplicate content?
Learning Newbie's Avatar
Defies a Status

Latest Blog Post:
Astounding Republican Paranoia
Posts: 5,662
Name: John Alexander
Trades: 0
We all know the drill. You post the same thing on 2 pages of your site, only 1 of them will show up in the SERPs. Post something from a different site on yours, and you won't (or shouldn't) turn up at all. I've never done any testing, but I assume this is true. That means Google is able to filter duplicate content, but it also means they're doing something more than just hashing the HTML code, or different templates would seem to be different content.

A good friend of mine wants to create a personal database system. It should capture all types of documents, from voice recordings to email. And especially because of that last one, he's asking me for ideas how he can filter out the duplicate content from his own system. But in a fuzzy way, instead of binary comparisons that would give false negatives. And I haven't been able to come up with one, but I do know that Google is pretty good at "organizing the world's information" so it seems like borrowing their ideas would be a good start.

I don't suppose anyone has any ideas, or thoughts where I should send him to look?
__________________

Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE
Learning Newbie is offline
Reply With Quote
View Public Profile
 
 
Register now for full access!
Old 06-19-2008, 02:19 PM Re: How does Google uniquify duplicate content?
Skilled Talker

Posts: 68
Trades: 0
I have an experimental blog that rank on major SERP but it has duplicate content I build more quality link to my site and all of a sudden it came out 5 at first page of Google SERP.
mokmok69 is offline
Reply With Quote
View Public Profile Visit mokmok69's homepage!
 
Old 06-19-2008, 10:32 PM Re: How does Google uniquify duplicate content?
Average Talker

Posts: 17
Trades: 0
Duplicate content turns up on Google all the time, like when I submit an article to multiple directories.
__________________

Please login or register to view this content. Registration is FREE
Kaabi is offline
Reply With Quote
View Public Profile
 
Old 06-20-2008, 12:45 PM Re: How does Google uniquify duplicate content?
francis84's Avatar
Super Spam Talker

Posts: 931
Trades: 0
So... How Google know which site has the unique content? By means on when it was indexed? I think!
__________________

Please login or register to view this content. Registration is FREE
|
Please login or register to view this content. Registration is FREE
|
Please login or register to view this content. Registration is FREE
|
Please login or register to view this content. Registration is FREE
francis84 is offline
Reply With Quote
View Public Profile Visit francis84's homepage!
 
Old 06-20-2008, 02:48 PM Re: How does Google uniquify duplicate content?
Learning Newbie's Avatar
Defies a Status

Latest Blog Post:
Astounding Republican Paranoia
Posts: 5,662
Name: John Alexander
Trades: 0
I don't care so much how they know which is the original source - to worry about that, first I'd need to know how to identify whether X ~= Y?
__________________

Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE
Learning Newbie is offline
Reply With Quote
View Public Profile
 
Old 06-26-2008, 08:39 AM Re: How does Google uniquify duplicate content?
willcode4beer's Avatar
Super Moderator

Posts: 1,533
Name: Paul Davis
Location: San Francisco
Trades: 1
Quote:
Originally Posted by Learning Newbie View Post
I don't care so much how they know which is the original source - to worry about that, first I'd need to know how to identify whether X ~= Y?
There are a few ways. By running documents into a tree, you could just decide what level of duplication is enough to be considered duplicate.

Another option could be a Bayesian classifier.

You could even do word matching. First strip all the prepositions and short words punctuation,etc out of a doc, generate an index of counts for the remaining words. If your indexes match, it's probably duplicate.

None of these are 100% foolproof but, they would probably work for your purpose
__________________

Please login or register to view this content. Registration is FREE

willcode4beer is offline
Reply With Quote
View Public Profile
 
Old 06-26-2008, 12:42 PM Re: How does Google uniquify duplicate content?
ssandecki's Avatar
SEO Addict

Latest Blog Post:
Wordpress SEO Plugins
Posts: 295
Name: Stephen
Location: Chicago, IL
Trades: 0
Using the registered verison of www.copyscape.com would be my suggestion, however I'm not clear on the question. Do you want to find out how you can find duplicate content within your own site or if the content on your site is located else where?
__________________

Please login or register to view this content. Registration is FREE
- Learn how to get a cash advance on your pending lawsuit; if you lose your lawsuit you don't pay it back!
ssandecki is offline
Reply With Quote
View Public Profile Visit ssandecki's homepage!
 
Reply     « Reply to How does Google uniquify duplicate content?
 

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off





   
RSS Feed  Feeds: RSS   JS   XML
RSS Feed  Feeds for this forum: RSS   JS   XML



Page generated in 0.28726 seconds with 12 queries