Re: [siesta-dev] optimising mariachi

[prev] [thread] [next] [lurker] [Date index for 2003/05/30]

From: Richard Clamp
Subject: Re: [siesta-dev] optimising mariachi
Date: 00:08 on 30 May 2003
On Thu, May 29, 2003 at 11:30:05PM +0100, Nicholas Clark wrote:
> 
> OK. I admit that for >50% of the subject I don't know what I'm doing.
> (ie 100% of the "mariachi" bit)
> 
> I used the 6M mailbox from p5p for May 2003, on mirth:
> 
> $ /usr/local/perl5.8.1/bin/perl5.8.1 -d:DProf mariachi 2003-05.mbox foo
> Should pass a list title
> reticulating splines                               0.000 elapsed 0.000 total
> 1200 messages
> load 1293                                          22.714 elapsed 22.714 total
[...]
> thread indexes                                     12.229 elapsed 47.271 total
> date indexes                                       11.769 elapsed 59.040 total
> message 1200
> message bodies                                     95.481 elapsed 154.521 total
> generate                                           0.008 elapsed 154.529 total
> 
> So it looks like "message bodies" is the first thing to attack.
> dprofpp says:

My feeling was to start on 'load' (and indeed I already have), which
is invoking all those C<'Email::Simple::_read_headers> calls.  It's
important for penderel (Paul did some benchmarking there,
http://paste.husk.org/198 shows incremental "message bodies"
generation really kicking ass)

It's also one of the two steps that you can't skip any of by doing
"this already has a html file" checking.  The other is threading,
though for that we can store a pre-computed thread tree and just
update it.

I guess we'll meet halfway through :)

> 0: Observation - all of those are in modules not directly mariachi.

Considering that Mariachi is two modules, and depends on many more,
this isn't a shock.

> 1: Does this list of prime time eating functions bear any relation to what
>    people know mariachi to be doing, particularly in the slower sections
>    in the elapsed output?

Yup.  Email::Find is used by Template::Plugin::Mariachi, as is
URI::Find.  That gets called for every page we output so we can mark
it up nicely.

> 2: Is it worrying that Memoize::_memoizer shows up?
> 

Ish.  I asked it to be there in that in Mariachi::Message filename is
memoized, as it allowed me to move that code out of C<new> which was
previously a ratty pile.  It's a little worrying that it's called
enough to show up though.

> Given that 
(in Email::Find)
> 
> sub addr_regex { $Addr_spec_re }
> 
> returns a constant, and the winner on time bloat is:
> 
> sub find {
>     my($self, $r_text) = @_;
> 
>     my $emails_found = 0;
>     my $re = $self->addr_regex;
>     $$r_text =~ s{($re)}{
>         my($replace, $found) = $self->validate($1);
>         $emails_found += $found;
>         $replace;
>     }eg;
>     return $emails_found;
> }
> 
> would turning that s///eg into s///ego be a good idea?

Looks likely.

> [it doesn't let you subclass with a dynamic return result for addr_regex]

Ah, but such a change is unwanted upstream then we can subclass it,
and just replace C<find> with one that does, and s'all good.

-- 
Richard Clamp <richardc@xxxxxxxxx.xxx>

Generated at 13:56 on 01 Jul 2004 by mariachi 0.52