Google, Microsoft and Yahoo! team up on web crawling, agree to a common sitemaps protocol

Google Microsoft and Yahoo logos all combinedA sitemap is a simple XML file which webmasters place in the root of their site to submit their pages for spidering and indexing by search engines. A sitemap is not just restricted to information about the URL’s and the frequently changing pages at web sites, but also could include meta information about those URLs such as when it was last updated, how often it usually changes, and how important a particular URL is , releative to other URLs in the site and also geo-location data, so that these search engines can more intelligently crawl and index the site. Web crawlers usually discover pages from links within the site and from other sites. Sitemaps supplement this data to allow crawlers that support Sitemaps to pick up all URLs in the Sitemap and learn about those URLs using the associated metadata.

Google was the first of the search engines to release Sitemaps almost a year ago. Now, Google, Yahoo and Microsoft, in an encouraging act of parternship announced today that they will begin to use the same sitemap protocol to index websites around the world. The new parternship is based out of sitemaps.org, which has lots of instructions for web masters on how to generate a standards compliant XML file on their servers which all three engines can then use to track updates to webpages.

The protocol is offered under an Attribution-ShareAlike Creative Commons License, so it can be used by any search engine, derivative variations using the same license can be created and it can be used for commercial purposes.

Any time competitors agree on open standards, that’s an enabler of further innovation and something to celebrate. It’s also great to see Creative Commons receiving all the more validation.

Search engine guru Danny Sullivan wrote the following tonight about the move.

Overall, I’m thrilled. It took nearly a decade for the search engines to go from unifying around standards for blocking spidering and making page description to agreeing on the nofollow attribute for links in January 2005. A wait of nearly two years for the next unified move is a long time, but far less than 10 and progress that’s very welcomed. I applaud the three search engines for all coming together and look forward to more to come.

Several people have made early public statements indicating that the next move will be to develop meaningful standards support for robots.txt files. Imagine a future when these players agree on standards for user control of data, microformats or truly neutral party click-fraud tracking and prevention.