Ye olde English is dead
There are probably as many solutions to creating SEO friendly urls as there are actual implementations. After real work yesterday I started looking at the rather simple method based on the PHP built in strtr() we use. It’s very simple and apart from some uppercase, lowercase and utf8 juggling the (very shortened) basis is something like below.
$isochars = "\xFC\xFD\xFF"; $asciichars = "uyy"; $urlfriendly = strtr($actual_string, $isochars, $asciichars);
What struck me was that the character þ (FE in hex) was translated into y. As is correct if you look at how y, in different forms, was used as an abbreviation for the, that and so on in old English – or Anglo-Saxon. Often it is now written “Ye” as in the blog post title. (Yes – the “Y” in the title should be pronounced as “th“.
However, it is a bit odd as the Icelandic language still use the letter frequently. The sound value is more or less the equivalent of the English “th” in this or the.
The major value in SEO friendly urls is the readability. Shouldn’t it be more friendly and natural to translate þ into “th” in SEO friendly urls then?
If you enjoyed this post, please consider to leave a comment or subscribe to the feed and get future articles delivered to your feed reader.

but who actually still uses \xFE?
One could argue that it’s not really worthwhile to write a sanitizer for non-alphabetic and numeric characters as no-one tends to use them in their titles anyhow.. If you do write it up, I imagine there’s a few more characters that need replacing.
Well, Icelandic people probably use in daily life when writing Icelandic texts…
It’s not about sanitizing, it’s about creating search engine/user friendly ascii urls that are as close as possible to the original title but excludes national characters to be a little bit more international viable.
And to clarify, the above code example is shortened. For the sake of an example it is not necessary to include all western characters that needs to be supported.
You can look at the Translit PECL pack :
http://pecl.php.net/package/translit
This extension allows you to transliterate text in non-latin characters (such as Chinese, Cyrillic, Greek etc) to latin characters. Besides the
transliteration the extension also contains filters to upper- and lowercase latin, cyrillic and greek, and perform special forms of transliteration such as converting ligatures such as the Norwegian “æ” to “ae” and normalizing punctuation and spacing.
I have tested translit with with good results.
echo transliterate('þorn', array('normalize_ligature'), 'utf8', 'iso-8859-1');The above code will output thorn which is more correct than Y.
I only go as far as removing diacritics marks (‘è’ becomes ‘e’):
http://ossigeno.svn.sourceforge.net/viewvc/ossigeno/trunk/core/library/Otk/Filter/Diacritics.php?revision=520&view=markup