[prev] [thread] [next] [lurker] [Date index for 2004/07/05]
On Thu, Jul 01, 2004 at 09:53:06AM +0100, Simon Wistow said: > I've actually been playing around with this a solution to the 'slow to > generate massive archives' problem over the last couple of days. > > I'll let you know how I get on. I got a quick dose of tuits over the weekend and finished off what I'd been doing. Since Richard is away this was spectacularly bad timing since I don't get to rely on the ulitmate debugging tool which is his Cluebat. However I was in full on revision avoidance mode so I did it anyway. Whilst this borrows code and concepts from the original Mariachi and cheekily steals the name it's little more than a proof of concept (it's not much more than a few hours work, all told). With that said, and with a host of other disclaimers hinted at, here we go. The good news is that this 'interpretation' is much easier to extend - it uses plugins to generate each of the different views on the data. At the moment there are plugins for date author thread lurker thread message (which displays individual messages) atom (based on Ben's code) and that regenerating pages is really quick once the messages are in the store. The bad news is that getting stuff into the store is very slow at the moment. About 5 or 6 times slower than the original mariachi. Whilst in some ways a slow down is to be expected this is not really acceptable. However the other good news is that I think it should be ripe for speeding up. My code is very naive in places (no memoize and I'm sure I work stuff out twice in different places of the code [0]) and also I have lots and lots of development stuff for Email::Store installed so, for example, at the moment, every time we import a message we Extract a summary Work out what thread its in Extract all the Named Entities from it Do relationship mapping (http://blog.simon-cozens.org/6744.html) Index a load of stuff into Plucene Work out what mailing list the mail was sent to Extract all the attachments Store all this stuff in a DB which is all well and good but is far more than Original Mariachi is doing. The code, if anybody wants to have a look, is here http://www.thegestalt.org/simon/mariachi/Mariachi-0.1.tar.gz http://www.thegestalt.org/simon/mariachi/Mariachi-0.1/ Which also has examples of output from a couple of hundred messages You can, for example, find all the mails from me http://www.thegestalt.org/simon/mariachi/author/simon@xxxxxxxxxx.xxx.xxxx and then easily surf to a mail and hence the thread it was part of in either normal or lurker form http://www.thegestalt.org/simon/mariachi/lurker/723192D023274188007249BD17EBC478.MAI@xxxxxxxxxxx.xxx.xxxx http://www.thegestalt.org/simon/mariachi/thread/723192D023274188007249BD17EBC478.MAI@xxxxxxxxxxx.xxx.xxxx or see everything on a particular day http://www.thegestalt.org/simon/mariachi/date/2004/06/01.html There are bugs to do with threads, it looks like ass and there's no paging but I think it's an interesting base to have a look at working from. Or using to decide that Email::Store is not a good fit for Mariachi. Please check out the TODO list here http://www.thegestalt.org/simon/mariachi/Mariachi-0.1/TODO Simon [0] It's not helped that Messages get slung around as Email::Simple, Mariachi::Message and Email::Store::Message in various bits of the code. This is obviously suboptimal. And ugly.There's stuff above here
Generated at 09:00 on 03 Aug 2004 by mariachi 0.52