[prev] [thread] [next] [lurker] [Date index for 2005/02/22]
Author: simon
Date: 2005-02-22 11:02:30 +0000 (Tue, 22 Feb 2005)
New Revision: 1944
Added:
trunk/Email-Store-NamedEntity/
trunk/Email-Store-NamedEntity/Build.PL
trunk/Email-Store-NamedEntity/Changes
trunk/Email-Store-NamedEntity/MANIFEST
trunk/Email-Store-NamedEntity/MANIFEST.SKIP
trunk/Email-Store-NamedEntity/README
trunk/Email-Store-NamedEntity/lib/
trunk/Email-Store-NamedEntity/lib/Email/
trunk/Email-Store-NamedEntity/lib/Email/Store/
trunk/Email-Store-NamedEntity/lib/Email/Store/NamedEntity.pm
trunk/Email-Store-NamedEntity/t/
trunk/Email-Store-NamedEntity/t/01.t
trunk/Email-Store-NamedEntity/t/test.mail
Log:
Import an new (old) project
Added: trunk/Email-Store-NamedEntity/Build.PL
===================================================================
--- trunk/Email-Store-NamedEntity/Build.PL 2005-02-22 10:52:39 UTC (rev 1943)
+++ trunk/Email-Store-NamedEntity/Build.PL 2005-02-22 11:02:30 UTC (rev 1944)
@@ -0,0 +1,18 @@
+use strict;
+use Module::Build;
+
+my $build = Module::Build
+ ->new( module_name => "Email::Store::NamedEntity",
+ license => 'perl',
+ requires => {
+ 'Email::Store' => 0,
+ 'File::Slurp' => 0,
+ 'Test::More' => 0,
+ 'Lingua::EN::NamedEntity' => 0,
+ 'DBD::SQLite2' => 0,
+ },
+ create_makefile_pl => 'traditional',
+ );
+
+$build->create_build_script;
+
Added: trunk/Email-Store-NamedEntity/Changes
===================================================================
--- trunk/Email-Store-NamedEntity/Changes 2005-02-22 10:52:39 UTC (rev 1943)
+++ trunk/Email-Store-NamedEntity/Changes 2005-02-22 11:02:30 UTC (rev 1944)
@@ -0,0 +1,27 @@
+Sun Jul 4 13:32:48 BST 2004 v1.3
+-----------------------------------------------
+Eeek - 1.2 didn't compile. Stupid Simon.
+
+
+
+Mon Jun 21 09:54:46 BST 2004 v1.2
+-----------------------------------------------
+Patch from Simon Cozens which should make things
+slightly saner if it can't find all the necessary
+hidden dictionary files.
+
+
+
+Sat Jun 19 18:21:17 BST 2004 v1.01
+-----------------------------------------------
+Well, it would have been the last release if I hadn't
+been a muppet and forgotten an 's' and something out
+of the MANIFEST.
+
+
+
+
+Thu Jun 17 19:17:49 BST 2004 v1.0
+-----------------------------------------------
+First release. Unlikely to need another, I hope.
+
Added: trunk/Email-Store-NamedEntity/MANIFEST
===================================================================
--- trunk/Email-Store-NamedEntity/MANIFEST 2005-02-22 10:52:39 UTC (rev 1943)
+++ trunk/Email-Store-NamedEntity/MANIFEST 2005-02-22 11:02:30 UTC (rev 1944)
@@ -0,0 +1,9 @@
+Build.PL
+Makefile.PL
+MANIFEST This list of files
+README
+META.yml
+Changes
+lib/Email/Store/NamedEntity.pm
+t/01.t
+t/test.mail
Added: trunk/Email-Store-NamedEntity/MANIFEST.SKIP
===================================================================
--- trunk/Email-Store-NamedEntity/MANIFEST.SKIP 2005-02-22 10:52:39 UTC (rev 1943)
+++ trunk/Email-Store-NamedEntity/MANIFEST.SKIP 2005-02-22 11:02:30 UTC (rev 1944)
@@ -0,0 +1,12 @@
+\.svn
+\.bak
+~$
+\.tar\.gz$
+\.mail
+MANIFEST.SKIP
+_build
+blib
+Build$
+^TODO$
+t/test.db
+entest
Added: trunk/Email-Store-NamedEntity/README
===================================================================
--- trunk/Email-Store-NamedEntity/README 2005-02-22 10:52:39 UTC (rev 1943)
+++ trunk/Email-Store-NamedEntity/README 2005-02-22 11:02:30 UTC (rev 1944)
@@ -0,0 +1,56 @@
+NAME
+ Email::Store::NamedEntities - Provides a list of named entities for an
+ email
+
+INSTALL
+ The usual :
+
+ % perl Build.PL
+ % ./Build
+ % ./Build test
+ % sudo ./Build install
+
+ or, if you don't have Module::Build
+
+ % perl Makefile.PL
+ % make
+ % make test
+ % sudo make install
+
+
+SYNOPSIS
+ Remember to create the database table:
+
+ % make install
+ % perl -MEmail::Store="..." -e 'Email::Store->setup'
+
+ And now:
+
+ foreach my $e ($mail->named_entities) {
+ print $e->thing," which is a ", $e->description,"(score=",$e->score(),")\n";
+ }
+
+DESCRIPTION
+ This extension for "Email::Store" adds the "named_entity" table, and
+ exports the "named_entities" method to the "Email::Store::Mail" class
+ which returns a list of "Email::Store::NamedEntity" objects.
+
+ A "Email::Store::NamedEntity" object has three fields -
+
+ thing
+ The entity we've extracted e.g "Bob Smith" or "London" w
+
+ description
+ What class of entity it is e.g "person", "organisation" or "place"
+
+ score
+ How likely like it is to be that class.
+
+SEE ALSO
+ Email::Store::Mail, Lingua::EN::NamedEntity.
+
+AUTHOR
+ Simon Wistow, "simon@xxxxxxxxxx.xxx"
+
+ This module is distributed under the same terms as Perl itself.
+
Added: trunk/Email-Store-NamedEntity/lib/Email/Store/NamedEntity.pm
===================================================================
--- trunk/Email-Store-NamedEntity/lib/Email/Store/NamedEntity.pm 2005-02-22 10:52:39 UTC (rev 1943)
+++ trunk/Email-Store-NamedEntity/lib/Email/Store/NamedEntity.pm 2005-02-22 11:02:30 UTC (rev 1944)
@@ -0,0 +1,129 @@
+package Email::Store::NamedEntity;
+use 5.006;
+use strict;
+use warnings;
+our $VERSION = '1.3';
+use Email::Store::DBI;
+use base 'Email::Store::DBI';
+use Email::Store::Mail;
+
+
+Email::Store::NamedEntity->table("named_entity");
+Email::Store::NamedEntity->columns(All => qw/id mail thing description score/);
+Email::Store::NamedEntity->columns(Primary => qw/id/);
+Email::Store::NamedEntity->has_a(mail => "Email::Store::Mail");
+Email::Store::Mail->has_many( named_entities => "Email::Store::NamedEntity" );
+
+
+
+sub on_store_order { 80 }
+
+sub on_store {
+ my ($self, $mail) = @_;
+ my $simple = $mail->simple;
+ require Lingua::EN::NamedEntity;
+
+ foreach my $e (Lingua::EN::NamedEntity::extract_entities($simple->body))
+ {
+
+ my $class = $e->{class};
+ my $score = $e->{scores}->{$class} || 0;
+ Email::Store::NamedEntity->create({
+ mail => $mail->id,
+ thing => $e->{entity},
+ description => $class,
+ score => $score,
+ });
+ }
+}
+
+sub on_gather_plucene_fields_order { 80 }
+
+# Bet you weren't expecting that!
+sub on_gather_plucene_fields {
+ my ($self, $mail, $hash) = @_;
+
+ my %topics;
+ foreach my $e ($mail->named_entities) {
+ push @{$topics{lc($e->description)}}, lc($e->thing);
+ }
+
+ foreach my $key (keys %topics) {
+ $hash->{$key} = join ' ', @{$topics{$key}};
+ }
+
+}
+
+=head1 NAME
+
+Email::Store::NamedEntity - Provides a list of named entities for an email
+
+=head1 SYNOPSIS
+
+Remember to create the database table:
+
+ % make install
+ % perl -MEmail::Store="..." -e 'Email::Store->setup'
+
+And now:
+
+ foreach my $e ($mail->named_entities) {
+ print $e->thing," which is a ", $e->description,"(score=",$e->score(),")\n";
+ }
+
+=head1 DESCRIPTION
+
+C<Named entities> is the NLP jargon for proper nouns which represent people,
+places, organisations, and so on. Clearly this is useful meta data to extract
+from a body of emails.
+
+This extension for C<Email::Store> adds the C<named_entity> table, and exports
+the C<named_entities> method to the C<Email::Store::Mail> class which returns
+a list of C<Email::Store::NamedEntity> objects.
+
+A C<Email::Store::NamedEntity> object has three fields -
+
+=over 4
+
+=item thing
+
+The entity we've extracted e.g "Bob Smith" or "London" w
+
+=item description
+
+What class of entity it is e.g "person", "organisation" or "place"
+
+=item score
+
+How likely like it is to be that class.
+
+=back
+
+C<Email::Store::NamedEntity> will also attempt to index each field
+so that if you ahve the C<Email::Store::Plucene> module installed
+then you could search using something like
+
+ place:London
+
+
+=head1 SEE ALSO
+
+L<Email::Store::Mail>, L<Lingua::EN::NamedEntity>.
+
+=head1 AUTHOR
+
+Simon Wistow, C<simon@xxxxxxxxxx.xxx>
+
+This module is distributed under the same terms as Perl itself.
+
+=cut
+
+1;
+__DATA__
+CREATE TABLE named_entity (
+ id int AUTO_INCREMENT NOT NULL PRIMARY KEY,
+ mail varchar(255),
+ thing varchar(255),
+ description varchar(60),
+ score float(4,2)
+);
Added: trunk/Email-Store-NamedEntity/t/01.t
===================================================================
--- trunk/Email-Store-NamedEntity/t/01.t 2005-02-22 10:52:39 UTC (rev 1943)
+++ trunk/Email-Store-NamedEntity/t/01.t 2005-02-22 11:02:30 UTC (rev 1944)
@@ -0,0 +1,24 @@
+use Test::More tests => 5;
+use File::Slurp;
+BEGIN { unlink("t/test.db"); }
+use Email::Store "dbi:SQLite2:dbname=t/test.db";
+Email::Store->setup;
+ok(1, "Set up");
+
+my $data = read_file("t/test.mail");
+Email::Store::Mail->store($data);
+
+# We need one mail:
+my @mails = Email::Store::Mail->retrieve_all;
+is(@mails, 1, "Only one mail");
+# is($mails[0]->message_id, '20001128211546.A29664@xxx.xxx', "Correct ID");
+
+my @entities;
+
+ok(@entities = $mails[0]->named_entities );
+
+my %ent_map = map { lc($_->thing) => $_ } @entities;
+
+is ($ent_map{'switzerland'}->description,"place");
+is ($ent_map{'tony ageh'}->description,"person");
+
Added: trunk/Email-Store-NamedEntity/t/test.mail
===================================================================
--- trunk/Email-Store-NamedEntity/t/test.mail 2005-02-22 10:52:39 UTC (rev 1943)
+++ trunk/Email-Store-NamedEntity/t/test.mail 2005-02-22 11:02:30 UTC (rev 1944)
@@ -0,0 +1,49 @@
+From: Foo Bar <foo@xxx.xxx>
+To: Me <simon@xxxxxxxx.xxx>
+Subject: Test Mail
+Message-ID: 20001128211546.A29664@xxx.xxx
+
+
+In November of 1994, I caught a flight to San Francisco with the Guardian's
+Tony Ageh, the designer Rik Gadsby, and the most terrifyingly efficient man
+I'd ever seen in my life. His name was Ian Stewart. He was one of the
+venture capitalists responsible for financing Wired in the US. While we
+waited for the flight, he arranged us all Executive Club cards so we'd
+never have to wait for another flight again. On the aeroplane, he reeled
+off the best places to dine in San Francisco, largely in inflected
+Japanese. He booked us into the most sumptuous hotel I could imagine, and
+then invited us to an 8.00 am "working breakfast" at his hotel, which was
+even grander. He spoke faster than Azeem on Speed, and twitched, as though
+he was hearing stock market reports being read out by a voice inside his
+head. Perhaps he was; it was impossible to tell. I had no guidelines: he
+was the first Wired person I had met.
+
+As it was, most of his tourist advice was wasted on me. I spent most of my
+San Fran nights in my room, staring like a refugee at the cable TV. I was
+not efficient. I was a slacker. Two months before I had been on the dole,
+as I had been for three years, moonlighting in a broken-down stand-up show
+about my BBS experiences. Then a man came up to me one night and asked me
+what MPEG did. I explained, and he in turn explained that his name was Tony
+Ageh and that he worked for the Guardian and he wanted to change the
+political system in this country and utilising new technology seemed to be
+a good way of achieving this and he might be launching a UK version of
+Wired and would I like a job?
+
+And I said - and memory blurs whether these were my actual words at the
+time - if this gives me a chance to go to San Francisco and watch Talk
+Soup and E! News and Mystery Science Theatre 3000 drunk at 3am in a black
+and pink decorated bedroom and get paid for it, why the hell not?
+
+I was taken along, because I was the only editorial person they had to show
+the Wired US team. I'm thinking "this Ian Stewart guy commutes to his
+London job from Switzerland, he got funding for Wired - the best fucking
+magazine on the planet, he is the extropian ubermensch, and *he's* just
+their money man. What are *they* going to be like?"
+
+He turns to me and says, 'I'm sure Louis will be very interested in what
+you have to say'.
+
+And I'm thinking: Now I am going to have to fake this efficiency thing,
+very, very seriously.
+
+Nice try. Two days later, I lost my passport.
Generated at 12:00 on 22 Feb 2005 by mariachi 0.52