Archæology

The assorted finds of Artefact Publishing

Bad Googlebot!

It appears that Googlebot makes an unwarranted assumption when it determines what links it will crawl. The Dreaming Web site was, until recently, configured so that a request for / would result in the file start.html being served. In that file was a link to /index.html (associated with the file index.html), and Googlebot has never requested that file (other search bots have). I guess that Googlebot figures that index.html is simply the same resource as /, because the index.html is a common directory index page.

This assumption is probably handy for a lot of sites which use / and /index.html interchangeably, but it is of dubious merit: not only is it occasionally an incorrect assumption that the two URLs point to the same resource, but it hides the mistake of the website in using two different URLs for the same resource, breaking caching and the like. So, Google, fix your bot.

Posted by jamie on June 7, 2004 16:01+12:00

Comments