Google Sitemaps
by Uche Ogbuji
|
Pages: 1, 2
A Little Help from the Tools
Google's tool for generating Sitemaps is sitemap_gen.py. It's an open source program (BSD 2.0 license) written in Python 2.2.
The Google Sitemap Generator is a Python script that creates a Sitemap for your site using the Sitemap Protocol. This script can create Sitemap XML protocol files from URL lists, web server directories, or from access logs.
I had a look at version 1.3 of sitemap_gen.py, which is hosted on SourceForge. It is designed to run from your site's web server, but it provides a lot of options, and you always have the option of simply specifying the URLs in a text file. In this case, all the tool does is take a text file of delimited records and convert it to XML without any more special processing. You can also use it to trawl through your HTTP server access logs for URLs to add to the Sitemap, to consolidate multiple existing Sitemap XML files into one large one, and more. The documentation for typical use of the tool is very clear, so I'll move on to how you would use it from other Python tools.
Once you've set up a Sitemap configuration file, it's very easy to import Google's tool and trigger a regeneration of the output Sitemap file (the output file location is also set in your configuration). Listing 3 is a simple usage example from Python code.
Listing 3. Using Google Sitemap Generator from Python
import sitemap_gen
CONFIG_FNAME = '/path/to/sitemap-config.xml'
sitemap = sitemap_gen.CreateSitemapFromFile(CONFIG_FNAME, False)
if sitemap:
sitemap.Generate()
else:
#Indicates an error in configuration
pass
You could use such code to work Sitemap updates into your Python web application. As an example, if you run a weblog using PyBlosxom, as I do, you could touch up your Sitemap in your CGI handler. Doing so is a fairly expensive task, so it's not something you'd want to do every time. I have a PyBlosxom plugin, task_control.py, which allows you to run Python scripts upon CGI request, but only if a specified interval has passed since the last run. You can run Listing 3 pretty much as it is from that tool. Of course you can also use cron to run Google's sitemap_gen directly.
And Yahoo! Too
For a long time, Yahoo has offered services for submitting URLs for its web crawler. This was a free service with which you could only enter one URL at a time into a Web form. For bulk submission, you had to pay for the privilege. The arrival of Google Sitemaps prompted Yahoo to provide a free service with a similar option for specifying collections of URLs for indexing. In the case of Yahoo, you create a simple text file with one URL per line, publish it on the web, and submit the URL for that file to Yahoo's submission form. Such a simple URL-per-line text file is also supported by Google Sitemaps, but Yahoo doesn't additionally support the sorts of URL metadata that Google does. You can generate a plain URL list from a Google Sitemap protocol XML file very easily with XSLT, as in Listing 4.
Listing 4. Using XSLT to Generate a Yahoo URL list from a Google Sitemaps XML File
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:gsm="http://www.google.com/schemas/sitemap/0.84"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:template match="gsm:url">
<xsl:value-of select="gsm:loc"/><xsl:text> </xsl:text>
</xsl:template>
</xsl:stylesheet>
Wrap Up
As with all slick new ideas on the web, you have to be careful. Sometimes the need for a specialized Sitemap says something about the usability of your site, and what it says might not be complimentary. If you are losing or confusing traditional web crawlers, there is some chance that you're losing human visitors as well. Of course, if your website is new, for example, you've moved a site to a new domain, it makes a lot of sense to make an initial submission to Google and Yahoo (and other search engines) to speed up discovery of the new site, but it may be best not to rely too much on the metadata for tuning URLs in Google's service. It's very unlikely that Google will ever subordinate its traditional Web crawler heuristics to such user-provided hints, and indeed Google is explicit about the ancillary role of Sitemaps in their overview.
[The Sitemaps] program does not replace our normal methods of crawling the web. Google still searches and indexes your sites the same way it has done in the past whether or not you use this program. A Sitemap simply gives Google additional information that we may not otherwise discover. Sites are never penalized for using this service.
Nevertheless, there is a lot of chatter on the various groups about sudden changes in Google PageRank after submitting a Sitemap, with a lot of complaints about negative effects (presumably if people saw positive effects they'd be less likely to make noise about the fact). Some claim they even suffered complete delistings from Google results after Sitemaps submission. It's hard to know how much of this to credit because the Google Groups support forum for Sitemaps is more a hangout for search engine optimizers and other marketing professionals than for technicians of the web. I would recommend the Inside Google Sitemaps weblog as the most reliable source of evolving details about the program. Certainly submitting URLs to Google's or Yahoo's Sitemaps programs provides no guarantee of ranking for your pages, just a guarantee of inclusion into the index of pages for the web crawlers.
Google recently supplemented its Sitemaps program with some Google Mobile Sitemaps, which provides specializations for Mobile Web Search, Google's search engine focusing on sites for small-format mobile devices such as cell phones and PDAs.
In the following months I shall continue my exploration of the Agile Web. In Perspective on XML: What Is This "Agility"? I shared my thoughts on agile development tools and practices. My academic training (first Electrical/Electronic Engineering in Nigeria, and then a Computer Engineering program in the U.S. that was very close to its EEE roots) conditions me to have some caution regarding the quest for absolute agility. On the other hand, I'm a big fan of Python and XML, technologies that are, as I said earlier, oft-cited examples of an agile programming language and an agile data format. In this column I'll apply this tension towards the analysis of the technological forces that are shaping the next generation web.
- Sitemap Writer Pro
2007-06-16 03:38:33 aghochikayn - bin
2010-08-18 06:43:02 huibin
2007-05-18 11:55:23 blloyd62