Hi all, this is an older article I wrote a few month ago...But I think its still relevent.
What is Google Sitemaps?
Google Sitemaps are a free beta program (experimental that is) by Google, in which you can submit a sitemap to inform and direct Google's crawlers. This allows Google to find what pages are to be indexed on your site and what has been changed or updated recently. Google Sitemaps can be used by anyone who has a hosting account or access to the folders you want indexed. Google also provides some valuable statistics which can only be viewed by the owner of the site, or at least the person who has root access (which is needed to verify ownership). For both accounts (sitemaps and stats access), you will need a google account which you can obtain one here.
Ok, so what is so special about these sitemaps?
As stated above, these sitemaps are used by Google's crawlers to be able to quickly index content in your site by following the sitemap on your web space. It tells them, much like a RSS file, what has been updated and what is the intended priority of that page. Now, what has impressed me is the speed in which a crawler comes to your site after you submit an updated sitemap. On some of my sites, I have seen the crawler visit within an hour or two. And even better was the fact that it indexed every page listed in my sitemap file! Keep in mind that this is a beta though., and speed & times will change. To keep on top of things, you an "ping" Google to announce that you have updated your sitemap (which should occur every time you update your site). Oh, one other tidbit; Does your site output Mobile pages for phones or PDA's?, then you can use "Mobile Sitemaps" to get those pages indexed!
Getting Started
In the simplest sense, you first need to create a Sitemap, then add it to your Sitemap account, and finally ping Google when your site changes. A couple different specific formats are allowed for Sitemap ; OAI-PMH, Syndication Feeds (RSS 2.0 and Atom 0.3) and Text file. Ok quick note, as you may have noticed, it does takes RSS and Atom. Well, there is one catch, your RSS feed usually doesn't show every page on your site, just the recent ones. I would keep this in mind if you plan on only using the Syndication Feeds. I wouldn't worry about the OAI-PMH format unless you have that framework on your site, which I am willing to bet you probably don't. The Text file is the preferred format for Google Sitemaps, and can be easily generated by a Sitemap Generator (either the one Google provides, or a third party). Google provides a free Sitemap Generator written in Python 2.2, and if your site can support Python 2.2 or greater, I would use it. I personally use a third-party one that is free. I plan on programming one for Community Server in the future to have the whole process automated and in the background.
The Sitemap
The Sitemap protocol is pretty simple. It consist of the url, when it was last modified, how often it changes, and its priority (suggested priority that is). The file looks similar to this:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
<url>
<loc>http://blog.theladderproject.com/</loc>
<lastmod>2006-06-03</lastmod>
<changefreq>Daily</changefreq>
<priority>0.8</priority>
</url>
</urlset>
The sitemap must:
* Begin with an opening <urlset> and end with a closing </urlset>
* Include the <url> entry for each URL as a parent XML tag.
* include a <loc> child for each <url> parent tag.
Easy enough? The only required fields are the urlset, url, and the loc. Everything else is optional. I highly recommend you use the optional ones as much as possible, especially the changefreq and priority. I also recommend you are as honest as possible for these optional fields. For example, don't put "always" as the changefreq field for a regular page, UNLESS its a page that on every visit, generates something new. I think the closer to the true chang frequency you get, the better chances you will have that the crawlers will follow directions. If you are constantly off by your optional tags and are saying things are changing all the time, when they are not, then I have a feeling that the crawlers will get wise to your file and start indexing your site without considering the sitemap. Priority is another similar deal. The Priority field is the "intended importance" of your page. This DOES NOT affect your page rank or your position in the search engine. It is merely there to help you define what pages you feel are more valuable to get indexed. Putting all high numbers will not help you since its a relative field. I personally set my front page high, and my oldest archives the lowest.
Submitting and updating your sitemap
Submitting is easy, just log into your sitemap account and follow the wizard for adding a sitemap. If you want to update a current sitemap then select the sitemap on the account and click resubmit. I personally like to use the ping address;
http://www.google.com/webmasters/sit...ap=sitemap_url. Be sure to URL encode everything after the /ping?sitemap=.
That's all there is to it! Next, I will cover the statistics included with your Google Sitemaps account and how they can benefit you greatly.