September 3, 2010

How To Kill nasty Word Garbage Characters in your CMS

Recently I was doing a server move for a client. From an ancient slow system costing her too much money (and me too much bother dealing with a know-it-all-wrong admin) at Today.net, to a nice modern VPS reliably, competently and fully managed by LiquidWeb.

Last year, I wrote her a travel reservation & quoting app in PHP/Drupal. That gave her a lot of nice CMS capabilities for handling her own pages, but the problem is that she uses Word to compose most of the text before adding it. Grah! Word is a lot of things to a lot of people, but it is not a good app to use for that purpose. It likes to leave more than a few garbage and invisible characters scattered around when you cut and paste from it.

The problem really reared its ugly head when I exported the data from her database. Or rather, when I imported it into her new server. Garbage characters everywhere. Messing with character encodings simply did not help either.

I was desperate to get this done, since it was 1 in the morning. I’d started the transfer late at night so I wouldn’t step on any customers creating new travel quotes. I had to get it done in an hour or two or else put it off to the next day and start over.

I almost started writing a load of regular expressions, but I did one last search for help. Aha! My beloved TextMate has a command for just this problem. Just select all, and go to “Bundles > Text > Converting > Transliterate Selection to ASCII”. Done!

Thank you TextMate.

P.S. Yes, I suppose I could’ve used the iconv command in Linux on the server. Maybe. Hassle hassle hassle.

[tags]osx,mysql,encoding,unicode,textmate[/tags]

Share and Enjoy:
  • Print
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • Blogplay

Related posts:

  1. Mac Memorization Software: Genius I’m taking a Spanish language class, in prep for an...

About Bruce Kroeze

Speak Your Mind

*