Re: Significant whitespace (was Re: Blogging sucks)

[prev] [thread] [next] [lurker] [Date index for 2005/10/17]

From: peter (Peter da Silva)
Subject: Re: Significant whitespace (was Re: Blogging sucks)
Date: 19:22 on 17 Oct 2005
> MTAs shouldn't be interpreting any utf-8.

MTA is actually an overloaded term, seeing as it incorporates mail
transport agents and mail routing agents. In addition software that
implements mail transport and routing includes spam filters, virus
checkers, vacation and other automatic response software, gateways,
and so on.

> UCS-4 breaks little assumptions like "A byte with a null in is the end 
> of a string".

Any software that deals with "bytes" rather than "characters" is going to
have to be rewritten. Right now it's being rewritten to use UTF-8 using
libraries that incorporate all kinds of mind-bogglingly heavy l11n and i18n
code that has immediate, obvious, and significant performance costs.

Even setting LOCALE to "C" doesn't get you back to the performance you had
with software that dealt with characters as atomic objects at the machine
level. Metaphorically changing (sizeof(char)) to 4 is about the only way to
get that performance back. The cost of reading and writing 4 times as many
bytes per character is swamped by the 10x or greater cost of i18n and l11n
code.

There's stuff above here

Generated at 20:00 on 17 Oct 2005 by mariachi 0.52