Posts: 5
Name: Matt Ball
Location: Calera, Alabama
|
Just a comment on the spider, I've noticed that in webmaster tools you might see that google has last visited you site 3 days ago. But you check your handy stats out from your web server and see that google came 3 hours ago, not 3 days ago. Furthermore, you go back and look at previous visits form googlebot and see that it was, in fact, there several times during that day.
What I'm seeing is there are two completely seperate parts to their caching / indexing method that somehow work together in the end to show the right pages.
In webmaster tools the updates are slow, inaccurate sometimes, and seem to be behind by at least a few days or even a week. The sitemaps you can submit to webmaster tools are the same thing as creating a sitemap and having it ping google when it's done notifying it of the location.
The thing is, if you submit your sitemap and keep it the same conventional name, don't change it, google will NEVER stop accessing that file for your list of URLs. It's their best way to ensure you are showing them your most important URLs. And in fact you can remove a sitemap from webmaster tools and google will still crall that sitemap until you physically remove it from your server or change the name.
So you don't have to submit it every day, or three times a day, or ever again after it's been accepted by webmaster tools or given the currect response using python automatically. You just keep updating that sitemap with your current, updated URL list and google will do the rest.
I can tell you this from first hand experience with a real estate site that had 15,000 pages in the sitemap. We discovered the pages were performing badly and had to completely remove all of the old pages and submit the new ones. Over 2 or 3 weeks I noticed I didn't get any new pages at all from the new sitemap indexed and the bad pages kept going up and getting errors all over the place because I removed them from the site.
So basically google was using up all of my crawl rate to try and access bad pages becase the old sitemap was still live, large, and fully operational sitting there with the bad pages just like normal. I submitted the new sitemap every day and couldn't understand for the life of me why those URLs wouldn't get cached, and they were GOOD pages unlike the old one's which had 3,800 lines of crap in front of the usable text.
PM me if you like, I've got lots of useless sitemap stories, but I finally know what they want, how fast I can shove it up their NICs and how much attention I have to pay to it. Glad to share.
|