Re: Linefeeds

[prev] [thread] [next] [lurker] [Date index for 2005/06/25]

From: David Champion
Subject: Re: Linefeeds
Date: 20:06 on 25 Jun 2005
* On 2005.06.20, in <200506201517.IAA23114@xxxxxx.xxx>,
*	"Dave Vandervies" <dj3vande@xxxxxx.xxx> wrote:
> > > 
> > line\n\r
> > feeds\m
> > are^m
> > hard&return;

True.


> out=fopen("foo","w");
> fputs("Nope, line\n",out);
> fputs("feeds are\n",out);
> fputs("actually\n",out);
> fputs("really easy\n",out);
> fclose(out);

True.


The problem isn't in I/O, it's in protocol.  Everyone has a new and
improved way of indicating logical line breaks within their own
cross-platform specification.

Traditionally, the Intarweb uses MS-DOS line breaks, \r\n, for maximum
naive portability, while some specific platforms use either \r or \n
solo.  Each endpoint needs to be able to recognize what it's receiving
and match what it's sending.

I'm on a development team for an application -- a network listener
with a bunch of arbitrary purpose behind it -- where, mysteriously,
for reasons undiscovered, someone got \r\n backwards.  It issues line
breaks as \n\r.  This is fine if you're a raw terminal device, and it
doesn't really matter, but if you're a client application, this might
matter.  And, in fact, the client I use most often doesn't recognize
\n\r as a line break; it recognizes it as two shizophrenic line breaks,
so I get everything in doublespace.  This has caused me some amount of
teeth-grinding.  I've had to turn vegetarian.


> Any system that runs general-purpose programs has a C I/O library that
> knows exactly how to do line feeds for that system, and most non-C
> languages either have C at the back-end anyways or can easily be
> coerced to use the C library for I/O.

So, the trouble is it's not the host system, it's the interchange.
What about data representations where a logical newline is zero-width
whitespace, used exclusively to prettify presentation of metadata?  The
C library doesn't have a special XML mode, or a special LDAP mode, or a
special Joe's L33t RDBMS mode -- nor should it.  At some point you just
have to accept that your application needs to have a brain, and also
to use it.  Personally -- and I'll admit that I'm speaking as a UNIX
developer here -- I wish C didn't differentiate text and binary, not
because they're the same, but because there's more than just text and
binary in that big bad world, and it's not the C library's job to know
the difference.  It's just an illusion to think this alone is going to
save your ass.


> The hard part is finding a cluestick big enough for all of the people
> who think all the world's a unix system and bypass the C stdio library

Yeah, that's the Mac, right there.  (Not really.)


> The OP indicates that apparently not even all unix systems are unix in
> this respect anymore...

Who said anything about UNIX systems?  Maybe it's iTunes for Windows,
with Cygwin providing her %EDITOR% of choice.

Newlines are hard, and it's not UNIX's fault.

-- 
 -D.    dgc@xxxxxxxx.xxx        NSIT    University of Chicago
There's stuff above here

Generated at 00:00 on 28 Jun 2005 by mariachi 0.52