Paging out

[prev] [thread] [next] [lurker] [Date index for 2003/07/28]

From: Richard Clamp
Subject: Paging out
Date: 19:14 on 28 Jul 2003
Okay, this one's been following me about for a while, and now we're at a
CPAN release of the main project I'm going to spool it out.

Mariachi has a fatal flaw.  It's a bug in incremental output generation
which comes from not being aware enough of the changing environment of
the messages.

To reproduce it, simply do the following.

   get a big mail store

   feed the first half of it through Mariachi

   feed both halves through Mariachi

What you'll see is the links to the thread indexes in some of the
messages just won't match up.  Now what's happened is they've linked on
the first pass, then found that they didn't need a regen.

There's possibly a couple of ways to fix this, and they all lead to
having to store more metadata.

I suggest stealing from Lurker again, and implementing what I'm going to
call parent designates.  The way this works is that after threading you
examine each message.  If the message doesn't have a parent designate
you assign them the messageid of the root of the thread they belong to.
When you genereate thread indexes you do one for each parent desginate
which exists (either redoing the messages with changed designates, or
generating multiple indexes), and use the messageid as a base for the
name of this index.  This new index is the one you use as backlinks on
the message pages.

Now as I said before, that needs a bunch more metadata, as we don't
really want to reparse the html to figure out the parent designate
(simply regenerating it each time won't work when the root of a thread
arrives last since that will lead to orphaned messages), so I'm thinking
that we might try going with Class::DBI again, and sticking parsed
messages, and maybe the thread tree, in a SQLite database.

Opinions on that?

While I'm posting I'll pass on some extra infomation - I'm taking August
off.  We've been working solidly on this for the past 3 months or so,
and now I can't be arsed anymore.  I'm going to fade into the background
here and do some work on other projects like perlbug-triage, CPANTS, and
ponie.  Rest assured I'll still be about to tell people what to do, but
expect to see more patches coming from the other people who've been
working on the thing.

Richard Clamp <>

Generated at 13:56 on 01 Jul 2004 by mariachi 0.52