[prev] [thread] [next] [lurker] [Date index for 2006/03/20]
This stems from spam, so right off the bat it's hateful. But other than the spam issue, I'm not sure where to place the rest of my hate since it crosses multiple programs across multiple platforms. Now, you may be asking yourself, ``Self, what does i18n have to do with spam and softare hate?'' Glad you asked (even if you didn't). I work at a small web hosting company and even though we're small, we get an insane amount of spam through our network (then again, who doesn't?). We have a dedicated platform (read: commercial, proprietary and expensive) that does nothing but filter for spam---it is, in effect, a Spam Firewall. You point the MX record to this device and it'll scrub the incoming email---blocking from known spammers, letting through the rest but marking on the subect line emails that *may* be spam (and so far, it's never been wrong when it marks an email as spam). So, a message that comes into the Spam Firewall as: Subject: Play longer! Increase your mortgate by 3 inches! if not outright blocked, will be slightly modified to read: Subject: [SPAM] Play longer! Increase your mortgate by 3 inches! I'm the system adminstrator for said small web hosting company, and as such, I have root's mail from each of our servers headed to my account. Which means I get a ton of email---log summaries, mail bounces, problem notifications, what have you. In order to keep from being inundated I've set up procmail to filter and file all my incoming email. So, it was easy enough to setup the following rule in procmail: :0: * ^Subject: .*SPAM.* in-SPAM Never mind the obscure syntax and the difficulty in actually scanning for a literal '['---this works enough to send all spam marked emails to the bit bucket. But I noticed that not all marked spam was being caught. There I am, in mutt, and what do I see in my inbox? Subject: [SPAM] Play longer! Increase your mortgate by 3 inches! That shouldn't be there. Let me test something---I sent from my personal account an email to my work account with "[SPAM]" in the subject line, and lo' it ended up in 'in-SPAM' just like I told procmail to do. Yet I still get Subject: [SPAM] Play longer! Increase your mortgate by 3 inches! What's going on? Suspecting that somehow procmail wasn't seeing the actual subject line, I checked the incoming mail spool file directly and what do I see? Subject: =?ISO-8859-1?B?W1NQQU1dIA==?= =?ISO-8859-1?B?UGxheSBsb25nZXIhICBJbmNyZWFzZSB5b3VyIG1vcnRnYXRlIGJ5IDMgaW5jaGVz?= Aha! [1] MIME crap! [1] I18n crap! [2] With varying degress of support (or non-support in the case of procmail). Okay, so where's the hate? Let's see ... the Spam Firewall? Okay, it's nice that it can decode encoded header lines, but *why* oh *why* does it encode "[SPAM]" if the subject line is encoded? Obviously you can have portions of a head encoded and not all of it. I'm guessing the Spam Firewall vendor can't (or probably won't) fix this because the actual bit that does the rewriting of the subject line is probably some third party i18n library that the Spam Firewall uses and it's not cost effective to "fix" this particular problem, since for most people it's not a "problem" at all. Stupid. Procmail? For not supporting i18n at all? Are there any regex engines out there that can deal with i18n? Does procmail need to be updated to support MIME? Hate. Mutt? Well ... it supports MIME and i18n, but it masked this particular problem for a few days. It's tempting to rip out MIME support from mutt (since I can't stand MIME but that's an issue I have to deal with) but it does make it difficult to deal with the occasional attachment. Perhaps a toggle to flip MIME support on and off ... Agravation. Spam? Well, that's pure hate incarnate. So I dutifully add: :0: ^Subject: =?ISO-8859-1?B?W1NQQU1dIA==?=.* in-SPAM to .procmailrc and get on with my life, until I start seeing Subject: [SPAM] Play longer! Increase your mortgate by 3 inches! in the inbox yet again. What now? Subject: =?UTF-8?B?W1NQQU1dIA==?= =?UTF-8?B?UGxheSBsb25nZXIhICBJbmNyZWFzZSB5b3VyIG1vcnRnYXRlIGJ5IDMgaW5jaGVz?= Sigh. -spc (Actually, I think it was originally encoded in WINDOWS-1251 which is a whole other form of hate ... ) [1] It's actually encoded in UTF-8 in this example---I don't have a full example in ISO-8859-1 but it's close enough to serve for an example. [2] Mostly hateful, but I can see a use for it. [3] Not crap at all, but I'm ranting here.
Generated at 12:00 on 03 Apr 2006 by mariachi 0.52