Apache’s mod_rewrite has become a friend to web developers throughout the world. It allows you to create tidy, user-readable URLs instead of ugly, meaningless links that often just expose an ID for a database record. A ‘Fancy URL’ will usually contain a version of the page title which is both human and machine readable as well as being a valid URL segment, this is called a slug.

Fancy URLs allow humans to easily remember locations of content and to better understand the logical layout of your website. They also improve the SEO quality of pages as they can provide search engines with additional keywords about the page.

And example of a fancy URL in action would be http://kerihenare.com/archive/2009/07/23/fancy-urls-and-slugs and the alternative (without mod_rewrite) would be http://kerihenare.com/?p=105. In this example the second URL would definitely be shorter, however it doesn’t mean anything to us. The first URL provides us with a content grouping (archive) as well as the date of article and a slugified page title. Slugification is the act of turning a string, such as a page title, into a slug - It’s one of those made up industry terms and may not be found in a dictionary.

In some recent development, I put a lot of effort into creating PHP & JavaScript functions that perform slugification. It’s very simple code, but really focused on creating rules that allow the slug to be as clean and meaningful as possible. The other day I noticed an example of bad slugs on a well-known website so I’ve decided to share my rules for a good slug.

A lot of site’s use simple slugification, where all non-alphanumerical characters are replaced with a dash. While such a method gets the job done the result isn’t always perfect. In simple slugification “Fancy URLs & Slugs” would become fancy-urls---slugs, which is a little ugly and replaces the word & (& = and, which is a word) with a dash loosing meaning and flow. Compare this with our earlier example of fancy-urls-and-slugs and you can clearly see the difference.

My rules for slugification:

  1. Convert the string to lowercase
  2. Replace all ampersands with ‘and’
  3. Replace whitespace and underscores with a dash
  4. Replace all Remove anything that isn’t a dash or an non-alphanumerical characters with a dash
  5. Replace all occurrences of more than one dash with a single dash
  6. Remove any dashes from the start and the end of the string

Wordpress is quite good in that it does steps 2 and 3 but not the first step some, but not all of the steps, maybe I should suggest the change.

Update: Arg, I shouldn’t make blog posts before 9am because I’m still half asleep. After a comment from Matthew Buchanan I’ve realised that I took some shortcuts in my rules so I have now updated them as per my source code comments. - 25/06/2009