Charset Abuse

[prev] [thread] [next] [lurker] [Date index for 2004/05/11]

From: Phil!Gregory
Subject: Charset Abuse
Date: 21:42 on 11 May 2004
Okay, getting everyone on a single, encompassing character set/encoding is
pretty much a pipe dream.  But why can't tools at least communicate what
charset they're using?

I'm used to getting web paged that misreport their charsets.  When I read
things like "I m sure it s ok", I can generally tell w3m, "Ignore what you
were told; this page is in CP1254."  This doesn't work if the content was
written in CP1254 but the publishing tool turned those 0x92 apostrophes
into ’ HTML entities, which is just wrong.

This isn't really a rant directed at any one thing in particular.  I just
wish all this stuff with character sets happened transparently and that I
wouldn't have to have learned what little I do know about the whole
process.  Like so many things, it should just _work_.

-- 
...computer contrarian of the first order... / http://aperiodic.net/phil/
PGP: 026A27F2  print: D200 5BDB FC4B B24A 9248  9F7A 4322 2D22 026A 27F2
--- --
Right....  I use my pickpocket ability to steal the egg from under the
mated pair of black dragons.
                       -- Famous Last Words, #1679
---- --- --

Generated at 14:02 on 01 Jul 2004 by mariachi 0.52