Re: Regexps (was Re: Invalid Operating System)

[prev] [thread] [next] [lurker] [Date index for 2006/12/18]

From: H.Merijn Brand
Subject: Re: Regexps (was Re: Invalid Operating System)
Date: 07:32 on 18 Dec 2006
On Sun, 17 Dec 2006 20:27:20 +0000, Robert Rothenberg <robrwo@xxxxx.xxx>
wrote:

> On 17/12/06 18:16 demerphq wrote:
> > On 12/17/06, Robert Rothenberg <robrwo@xxxxx.xxx> wrote:
> >> On 17/12/06 08:52 Dave Hodgkinson wrote:
> >>
> >> > Reach for the root cause: regexps themselves are hateful. Nasy,
> >> > cryptic line noise.
> >>
> >> As has been said in another message, regexps are their own language,
> >> which
> >> has origins in theoretical computer science and mathematics.  Like most
> >> expressions in mathematics and logic, it looks like "nas[t]y cryptic line
> >> noice," but it makes sense to those who know how to read it, and it's the
> >> most efficient means of expressing the concept.
> 
>  [...]
> 
> > Well, the two come from different eras so its hardly surprising that
> > they dont match. I mean you'd find it hard to read English from the
> > 15th century, and someone from the 15th century would have the same
> > troubles reading modern English.
> 
> Bad comparison: traditional regexps are much easier to read than the ones
> used in contemporary programming languages.

/me assumes here that you mean traditionally as the funny character stuff and
contemporary with the bloody spelled out junk. In which case ...

So true. It's exactly as Yves pointed out before. Reading a language you
don't speak, but that resembles something you /think/ you speak (Old-English
versus modern English) is probably even harder for native English speaking
people as it is for non-native English people.
($prev_paragraph =~ s{ \b English \b }{$your_favourite_spoken_language}xig)

Once you can read and understand it, every language, given written in a sane
context, can be easily read and understand. Some languages are just harder to
learn than others (Chinese has proved to be one of those).

Once complex structures appear in a language, it takes a trained eye to see
to real meaning, and replacing metacharacters or tokens with written words
does NOT easy that concept. It just makes you have to read more.

> That issue aside, note that I said "Like most expressions in mathematics and
> logic... it's the most efficient means of expressing the concept."  Regexps
> are mathematical expressions for strings instead of numbers.
> 
> You could just as well complain that the such as
> 
>   dist = sqrt( sqr(x_0 - x_1) + sqr(y_0 - y1) )
> 
> is too cryptic.  You could spell it out in several lines with lots of
> comments for the mathematically illiterate, but the compiler may produce
> sub-optimal code, and it will make less sense to those who know how to read
> equations.

See.

> Likewise, you could spell out your regexp with dozens of lines of
> indexof () and substr () function calls, but it will be less comprehendable
> than a single regexp, be more likely to have bugs, and not be compiled into
> an efficient finite-automata.

-- 
H.Merijn Brand         Amsterdam Perl Mongers (http://amsterdam.pm.org/)
using & porting perl 5.6.2, 5.8.x, 5.9.x   on HP-UX 10.20, 11.00, 11.11,
& 11.23, SuSE 10.0 & 10.1, AIX 4.3 & 5.2, and Cygwin. http://qa.perl.org
http://mirrors.develooper.com/hpux/            http://www.test-smoke.org
                        http://www.goldmark.org/jeff/stupid-disclaimers/
There's stuff above here

Generated at 22:02 on 27 Dec 2006 by mariachi 0.52