[siesta-dev] optimising mariachi

[prev] [thread] [next] [lurker] [Date index for 2003/05/29]

From: Nicholas Clark
Subject: [siesta-dev] optimising mariachi
Date: 23:30 on 29 May 2003
OK. I admit that for >50% of the subject I don't know what I'm doing.
(ie 100% of the "mariachi" bit)

I used the 6M mailbox from p5p for May 2003, on mirth:

$ /usr/local/perl5.8.1/bin/perl5.8.1 -d:DProf mariachi 2003-05.mbox foo
Should pass a list title
reticulating splines                               0.000 elapsed 0.000 total
1200 messages
load 1293                                          22.714 elapsed 22.714 total
dropped 0 duplicate messages
dedupe                                             0.091 elapsed 22.804 total
thread                                             6.134 elapsed 28.938 total
sanity                                             0.599 elapsed 29.537 total
order                                              1.831 elapsed 31.368 total
sanity                                             0.668 elapsed 32.037 total
strand                                             0.735 elapsed 32.771 total
splicing threads in 45 places
deep threads split up                              0.619 elapsed 33.390 total
sanity                                             0.670 elapsed 34.060 total
order                                              0.981 elapsed 35.041 total
index 21
thread indexes                                     12.229 elapsed 47.271 total
date indexes                                       11.769 elapsed 59.040 total
message 1200
message bodies                                     95.481 elapsed 154.521 total
generate                                           0.008 elapsed 154.529 total

So it looks like "message bodies" is the first thing to attack.
dprofpp says:

$ dprofpp tmon.out
Total Elapsed Time = 115.3059 Seconds
  User+System Time = 113.1859 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 13.3   15.06 19.496   3129   0.0048 0.0062  Email::Find::find
 13.2   15.03 18.450   3129   0.0048 0.0059  URI::Find::find
 8.30   9.395 149.71   8777   0.0011 0.0171  Template::Document::__ANON__
 7.83   8.867  8.866   1293   0.0069 0.0069  Email::Simple::_read_headers
 7.71   8.723 20.316 173776   0.0001 0.0001  Template::Stash::XS::get
 5.73   6.488  6.486   1293   0.0050 0.0050  Email::Simple::_split_head_from_bo
                                             dy
 4.22   4.778  5.116  34375   0.0001 0.0001  Memoize::_memoizer
 2.84   3.209  2.732 159192   0.0000 0.0000  Class::Accessor::Fast::__ANON__
 2.32   2.625 159.53   8777   0.0003 0.0182  Template::Context::process
 1.87   2.119  2.750  11049   0.0002 0.0002  Template::Plugin::Date::format
 1.67   1.889  1.886   1294   0.0015 0.0015  Email::Folder::Mbox::next_message
 1.50   1.699  1.630  23212   0.0001 0.0001  Template::Filters::html_filter
 1.50   1.699  3.264  26341   0.0001 0.0001  Template::Context::filter
 1.49   1.686  3.520   3296   0.0005 0.0011  Mail::Thread::Container::iterate_d
                                             own
 1.48   1.679  6.335  10124   0.0002 0.0006  Template::Context::template

0: Observation - all of those are in modules not directly mariachi.
1: Does this list of prime time eating functions bear any relation to what
   people know mariachi to be doing, particularly in the slower sections
   in the elapsed output?
2: Is it worrying that Memoize::_memoizer shows up?


Given that

sub addr_regex { $Addr_spec_re }

returns a constant, and the winner on time bloat is:

sub find {
    my($self, $r_text) = @_;

    my $emails_found = 0;
    my $re = $self->addr_regex;
    $$r_text =~ s{($re)}{
        my($replace, $found) = $self->validate($1);
        $emails_found += $found;
        $replace;
    }eg;
    return $emails_found;
}

would turning that s///eg into s///ego be a good idea?
[it doesn't let you subclass with a dynamic return result for addr_regex]

Nick

Generated at 13:56 on 01 Jul 2004 by mariachi 0.52