Re: [siesta-dev] optimising mariachi

[prev] [thread] [next] [lurker] [Date index for 2003/05/30]

From: Richard Clamp
Subject: Re: [siesta-dev] optimising mariachi
Date: 00:08 on 30 May 2003
On Thu, May 29, 2003 at 11:30:05PM +0100, Nicholas Clark wrote:
> OK. I admit that for >50% of the subject I don't know what I'm doing.
> (ie 100% of the "mariachi" bit)
> I used the 6M mailbox from p5p for May 2003, on mirth:
> $ /usr/local/perl5.8.1/bin/perl5.8.1 -d:DProf mariachi 2003-05.mbox foo
> Should pass a list title
> reticulating splines                               0.000 elapsed 0.000 total
> 1200 messages
> load 1293                                          22.714 elapsed 22.714 total
> thread indexes                                     12.229 elapsed 47.271 total
> date indexes                                       11.769 elapsed 59.040 total
> message 1200
> message bodies                                     95.481 elapsed 154.521 total
> generate                                           0.008 elapsed 154.529 total
> So it looks like "message bodies" is the first thing to attack.
> dprofpp says:

My feeling was to start on 'load' (and indeed I already have), which
is invoking all those C<'Email::Simple::_read_headers> calls.  It's
important for penderel (Paul did some benchmarking there, shows incremental "message bodies"
generation really kicking ass)

It's also one of the two steps that you can't skip any of by doing
"this already has a html file" checking.  The other is threading,
though for that we can store a pre-computed thread tree and just
update it.

I guess we'll meet halfway through :)

> 0: Observation - all of those are in modules not directly mariachi.

Considering that Mariachi is two modules, and depends on many more,
this isn't a shock.

> 1: Does this list of prime time eating functions bear any relation to what
>    people know mariachi to be doing, particularly in the slower sections
>    in the elapsed output?

Yup.  Email::Find is used by Template::Plugin::Mariachi, as is
URI::Find.  That gets called for every page we output so we can mark
it up nicely.

> 2: Is it worrying that Memoize::_memoizer shows up?

Ish.  I asked it to be there in that in Mariachi::Message filename is
memoized, as it allowed me to move that code out of C<new> which was
previously a ratty pile.  It's a little worrying that it's called
enough to show up though.

> Given that 
(in Email::Find)
> sub addr_regex { $Addr_spec_re }
> returns a constant, and the winner on time bloat is:
> sub find {
>     my($self, $r_text) = @_;
>     my $emails_found = 0;
>     my $re = $self->addr_regex;
>     $$r_text =~ s{($re)}{
>         my($replace, $found) = $self->validate($1);
>         $emails_found += $found;
>         $replace;
>     }eg;
>     return $emails_found;
> }
> would turning that s///eg into s///ego be a good idea?

Looks likely.

> [it doesn't let you subclass with a dynamic return result for addr_regex]

Ah, but such a change is unwanted upstream then we can subclass it,
and just replace C<find> with one that does, and s'all good.

Richard Clamp <>

Generated at 13:56 on 01 Jul 2004 by mariachi 0.52