Archæology

The assorted finds of Artefact Publishing

Don’t treat HTML as text!

Which seems to be exactly what Movable Type does. Though this blog is set up to use UTF-8, I tend to use numeric character references for all non-ASCII characters, in part because it’s often easier for me to input them, and in part because when I tried using the actual characters, bits of Movable Type behaved badly.

However, I have now discovered that the search function operates over the plain text of the HTML entries, and not over the parsed text. Which is to say that for the purposes of search, “faërie” (using the actual e with diaeresis character) and “faërie” (using the numeric character reference for e with diaeresis) are not the same thing. And that’s just wrong. To add insult to injury, the results page when searching on the latter (that is, entering “faërie” into the search box) autofills the search box with “faërie”, so that if you activate the search form again, it won’t return the same result. Madness!

I am hoping that Wordpress gets this right, by searching over the parsed text. Update: But, sadly, it doesn’t. I might just go file a bug.

Posted by jamie on April 16, 2005 14:24+12:00

Comments