|
So, we are Sergy and Larry . . .
We launch a bot that indexes everything that was linked from Stanford U's web site and log all the data into a data base.
Ok, that was cool, now we should create a way to search this data.
That was easy, we should let the bot dig deaper.
Now we have a lot of data, and the search does not work as well as it did.
hmmmm . . . .
Let's try making any site that is close in link steps to Stanfords site, rank better than the rest. We could consider these to be authority sites because of their proximity to the mother site.
That worked better, we should let the bot dig deaper.
Now we have tons of data, perhaps we need more mother sites, perhaps more edu sites would work.
We are getting a lot of cross references, we should put some type of accumlative link data into the search algo to improve the search results, we could call it page rank.
That worked, we should let the bot dig deaper and add a few mother sites.
Search returns could be better it looks like certain people are gaming page rank.
We really need to quantify links better, how about putting a new factor into the algo, the link text. This should tag a page and match it to it's content.
That worked somewhat, however, now people are spaming guestbooks and forums with keyword links.
Ok, we will ignore any guestbook links created by popular software and any sig tags in forums in calculations of link pop.
We are seeing an increase in mutual admiration links exchanges, the first step is to combat this is to ignore any page named link.html
We have seen an increase of people hitting sites that publish their log files to get backlinks. We need to Nix this asap
Jez, now there are people selling links!
We need to combat this on two fronts.
Bust the obvious link brokers and put fear into the market via the internet press.
Ok that worked somewhat, we really need to tune the algo better to capture the rest. Any outgoing links on the bottom of a page that is not surround by text should be suspect along with side bar stand alone text.
That nabbed a bunch, but we need to do better.
How about letting people report paid links? We can let people narc on each other and do our jobs rather than spend more on developing algos!
Well that kind of worked, but we received some bad press in that move.
Let's step back a moment . . . . .
A great portion of our search algo is based upon links.
We should reanalyzed how we consider the weight of links.
Proximity and redundancy would be a good start.
Last edited by brokenhtml; 12-04-2008 at 12:42 AM..
|