I am interested in language and languages, and I am interested in (web) publishing. Naturally, then, I do things like include bits of Thai or Pali on my blog, making sure that they are correctly marked up and use the proper Unicode characters. This is definitely a learning process, particularly for writing systems I’m not familiar with. For example, today I discovered that Unicode provides a zero width space character particularly for use in languages such as Thai (which does not usually mark word boundaries in its script), so I hurriedly went back and added them in to the bit of Thai I’ve used.

I only came across that bit about the zero width space because I was trying to figure out how the word ghazal is written in the Arabic script. You can see the results of my investigations. I now know a little more about the ways of handling Arabic script on the web, and a little more about the script itself, and I now think I could legitimately use غزل in place of ﻏﹷﺰﹶﻝ (which is what I copied from another site). I shall try to find out for sure.

Of course, the whole process of overcoming my near total ignorance is not made easier by the fact that I don’t have total faith in the ability of Mozilla/my system to do the proper rendering of the Arabic characters into glyphs. So I’m trying to learn how to correctly specify a word in a language and script I don’t know, not knowing what the word should look like or whether what I come up with is being rendered correctly. To prove my geekiness, I find this fun.

Posted by jamie on May 1, 2003 18:51+12:00


Yes, you are a geek. *My* geekiness was shown off by thinking that you were going to talk about Perl or Python with your "Scripting languages" title. I hope that was the reaction you were aiming for! :-)

Posted by: Michael Norrish on May 2, 2003 10:43+12:00

I figured it was a nice pun that I knew a number of my readers would get. Okay, perhaps not a pun, since it relies on Calvin & Hobbes' "verbing weirds language", but you know what I mean.

Posted by: Jamie on May 2, 2003 10:58+12:00