|
Why don't search engines share data? Yeah, I know, it's 'cause they compete with each other. So each and every one of them duplicate massive amounts of work. Most of that doesn't really seem to affect us at first blush, except for the bandwidth and server load.
When Slurp and MSNBot and MediaBot and GoogleBot and the rest of them all crawl over your site and some are rampant, it's because that's the only way they have of getting this data. Your public facing web site is their data share. They take the entire html you send down as a starting point, and then run it through all kinds of cleaning formulas to get it into whatever special format in their database schema they use.
And once you get to a certain point, they're all specialized, and couldn't share the data even if they wanted to. But still, in the name of overall efficiency, and also to keep the servers more idle and available, wouldn't it make sense for there to be some kind of open repository for at least the first, collection, phase of what the SEs do?
From there you could get more pie in the sky, and Google could potentially share their spam list of forbidden sites with Yahoo and MSN. They probably wouldn't want to since lately spam fighting is what differentiates Google. And MSN doesn't seem to care about spam, it's all good, MFA is plenty welcome. But wouldn't the infrastructure to share data be useful, even if they chose not to share much of it?
|