How To Kill nasty Word Garbage Characters in your CMS

Posted on | January 9, 2008 | No Comments

Recently I was doing a server move for a client. From an ancient slow system costing her too much money (and me too much bother dealing with a know-it-all-wrong admin) at Today.net, to a nice modern VPS reliably, competently and fully managed by LiquidWeb.

Last year, I wrote her a travel reservation & quoting app in PHP/Drupal. That gave her a lot of nice CMS capabilities for handling her own pages, but the problem is that she uses Word to compose most of the text before adding it. Grah! Word is a lot of things to a lot of people, but it is not a good app to use for that purpose. It likes to leave more than a few garbage and invisible characters scattered around when you cut and paste from it.

The problem really reared its ugly head when I exported the data from her database. Or rather, when I imported it into her new server. Garbage characters everywhere. Messing with character encodings simply did not help either.

I was desperate to get this done, since it was 1 in the morning. I’d started the transfer late at night so I wouldn’t step on any customers creating new travel quotes. I had to get it done in an hour or two or else put it off to the next day and start over.

I almost started writing a load of regular expressions, but I did one last search for help. Aha! My beloved TextMate has a command for just this problem. Just select all, and go to “Bundles > Text > Converting > Transliterate Selection to ASCII”. Done!

Thank you TextMate.

P.S. Yes, I suppose I could’ve used the iconv command in Linux on the server. Maybe. Hassle hassle hassle.

Technorati Tags: , , , ,

Comments

Leave a Reply





CommentLuv Enabled

Video & Audio Comments are proudly powered by Riffly