rev 1944 - in trunk: . Email-Store-NamedEntity Email-Store-NamedEntity/lib Email-Store-NamedEntity/lib/Email Email-Store-NamedEntity/lib/Email/Store Email-Store-NamedEntity/t

[prev] [thread] [next] [lurker] [Date index for 2005/02/22]

From: simon
Subject: rev 1944 - in trunk: . Email-Store-NamedEntity Email-Store-NamedEntity/lib Email-Store-NamedEntity/lib/Email Email-Store-NamedEntity/lib/Email/Store Email-Store-NamedEntity/t
Date: 11:02 on 22 Feb 2005
Author: simon
Date: 2005-02-22 11:02:30 +0000 (Tue, 22 Feb 2005)
New Revision: 1944

Import an new (old) project

Added: trunk/Email-Store-NamedEntity/Build.PL
--- trunk/Email-Store-NamedEntity/Build.PL	2005-02-22 10:52:39 UTC (rev 1943)
+++ trunk/Email-Store-NamedEntity/Build.PL	2005-02-22 11:02:30 UTC (rev 1944)
@@ -0,0 +1,18 @@
+use strict;
+use Module::Build;
+my $build = Module::Build
+  ->new( module_name => "Email::Store::NamedEntity",
+         license     => 'perl',
+         requires    => {
+                            'Email::Store' => 0,
+                            'File::Slurp' => 0,
+                              'Test::More'  => 0,
+                            'Lingua::EN::NamedEntity' => 0,
+                            'DBD::SQLite2' => 0,
+                        },
+         create_makefile_pl => 'traditional',
+       );

Added: trunk/Email-Store-NamedEntity/Changes
--- trunk/Email-Store-NamedEntity/Changes	2005-02-22 10:52:39 UTC (rev 1943)
+++ trunk/Email-Store-NamedEntity/Changes	2005-02-22 11:02:30 UTC (rev 1944)
@@ -0,0 +1,27 @@
+Sun Jul  4 13:32:48 BST 2004    v1.3
+Eeek - 1.2 didn't compile. Stupid Simon.
+Mon Jun 21 09:54:46 BST 2004    v1.2
+Patch from Simon Cozens which should make things
+slightly saner if it can't find all the necessary
+hidden dictionary files.
+Sat Jun 19 18:21:17 BST 2004    v1.01
+Well, it would have been the last release if I hadn't
+been a muppet and forgotten an 's' and something out
+of the MANIFEST.
+Thu Jun 17 19:17:49 BST 2004     v1.0
+First release. Unlikely to need another, I hope.

Added: trunk/Email-Store-NamedEntity/MANIFEST
--- trunk/Email-Store-NamedEntity/MANIFEST	2005-02-22 10:52:39 UTC (rev 1943)
+++ trunk/Email-Store-NamedEntity/MANIFEST	2005-02-22 11:02:30 UTC (rev 1944)
@@ -0,0 +1,9 @@
+MANIFEST            This list of files

Added: trunk/Email-Store-NamedEntity/MANIFEST.SKIP
--- trunk/Email-Store-NamedEntity/MANIFEST.SKIP	2005-02-22 10:52:39 UTC (rev 1943)
+++ trunk/Email-Store-NamedEntity/MANIFEST.SKIP	2005-02-22 11:02:30 UTC (rev 1944)
@@ -0,0 +1,12 @@

Added: trunk/Email-Store-NamedEntity/README
--- trunk/Email-Store-NamedEntity/README	2005-02-22 10:52:39 UTC (rev 1943)
+++ trunk/Email-Store-NamedEntity/README	2005-02-22 11:02:30 UTC (rev 1944)
@@ -0,0 +1,56 @@
+    Email::Store::NamedEntities - Provides a list of named entities for an
+    email
+   The usual :
+        % perl Build.PL
+        % ./Build 
+        % ./Build test
+        % sudo ./Build install
+   or, if you don't have Module::Build
+        % perl Makefile.PL
+        % make
+        % make test
+        % sudo make install
+    Remember to create the database table:
+        % make install
+        % perl -MEmail::Store="..." -e 'Email::Store->setup'
+    And now:
+        foreach my $e ($mail->named_entities) {
+            print $e->thing," which is a ", $e->description,"(score=",$e->score(),")\n";
+        }
+    This extension for "Email::Store" adds the "named_entity" table, and
+    exports the "named_entities" method to the "Email::Store::Mail" class
+    which returns a list of "Email::Store::NamedEntity" objects.
+    A "Email::Store::NamedEntity" object has three fields -
+    thing
+        The entity we've extracted e.g "Bob Smith" or "London" w
+    description
+        What class of entity it is e.g "person", "organisation" or "place"
+    score
+        How likely like it is to be that class.
+    Email::Store::Mail, Lingua::EN::NamedEntity.
+    Simon Wistow, ""
+    This module is distributed under the same terms as Perl itself.

Added: trunk/Email-Store-NamedEntity/lib/Email/Store/
--- trunk/Email-Store-NamedEntity/lib/Email/Store/	2005-02-22 10:52:39 UTC (rev 1943)
+++ trunk/Email-Store-NamedEntity/lib/Email/Store/	2005-02-22 11:02:30 UTC (rev 1944)
@@ -0,0 +1,129 @@
+package Email::Store::NamedEntity;
+use 5.006;
+use strict;
+use warnings;
+our $VERSION = '1.3';
+use Email::Store::DBI;
+use base 'Email::Store::DBI';
+use Email::Store::Mail;
+Email::Store::NamedEntity->columns(All => qw/id mail thing description score/);
+Email::Store::NamedEntity->columns(Primary => qw/id/);
+Email::Store::NamedEntity->has_a(mail => "Email::Store::Mail");
+Email::Store::Mail->has_many( named_entities => "Email::Store::NamedEntity" );
+sub on_store_order { 80 }
+sub on_store {
+    my ($self, $mail) = @_;
+    my $simple = $mail->simple;
+    require Lingua::EN::NamedEntity;
+    foreach my $e (Lingua::EN::NamedEntity::extract_entities($simple->body)) 
+    { 
+        my $class = $e->{class};
+        my $score = $e->{scores}->{$class} || 0;
+        Email::Store::NamedEntity->create({
+            mail => $mail->id,
+            thing => $e->{entity},
+            description => $class,
+            score => $score,
+        });
+    }
+sub on_gather_plucene_fields_order { 80 }
+# Bet you weren't expecting that!
+sub on_gather_plucene_fields {
+    my ($self, $mail, $hash) = @_;
+    my %topics;
+    foreach my $e ($mail->named_entities) {
+        push @{$topics{lc($e->description)}}, lc($e->thing);
+    }
+    foreach my $key (keys %topics) {
+        $hash->{$key} = join ' ', @{$topics{$key}};
+    }
+=head1 NAME
+Email::Store::NamedEntity - Provides a list of named entities for an email
+=head1 SYNOPSIS
+Remember to create the database table:
+    % make install
+    % perl -MEmail::Store="..." -e 'Email::Store->setup'
+And now:
+    foreach my $e ($mail->named_entities) {
+        print $e->thing," which is a ", $e->description,"(score=",$e->score(),")\n";
+    }
+C<Named entities> is the NLP jargon for proper nouns which represent people, 
+places, organisations, and so on. Clearly this is useful meta data to extract 
+from a body of emails.
+This extension for C<Email::Store> adds the C<named_entity> table, and exports
+the C<named_entities> method to the C<Email::Store::Mail> class which returns 
+a list of C<Email::Store::NamedEntity> objects.
+A C<Email::Store::NamedEntity> object has three fields -
+=over 4
+=item thing
+The entity we've extracted e.g "Bob Smith" or "London" w
+=item description 
+What class of entity it is e.g "person", "organisation" or "place" 
+=item score
+How likely like it is to be that class.
+C<Email::Store::NamedEntity> will also attempt to index each field
+so that if you ahve the C<Email::Store::Plucene> module installed 
+then you could search using something like
+    place:London
+=head1 SEE ALSO
+L<Email::Store::Mail>, L<Lingua::EN::NamedEntity>.
+=head1 AUTHOR
+Simon Wistow, C<>
+This module is distributed under the same terms as Perl itself.
+CREATE TABLE named_entity (
+    mail varchar(255),                                                 
+    thing varchar(255),                                                         
+    description varchar(60),                                                    
+    score float(4,2)

Added: trunk/Email-Store-NamedEntity/t/01.t
--- trunk/Email-Store-NamedEntity/t/01.t	2005-02-22 10:52:39 UTC (rev 1943)
+++ trunk/Email-Store-NamedEntity/t/01.t	2005-02-22 11:02:30 UTC (rev 1944)
@@ -0,0 +1,24 @@
+use Test::More tests => 5;
+use File::Slurp;
+BEGIN { unlink("t/test.db"); }
+use Email::Store "dbi:SQLite2:dbname=t/test.db";
+ok(1, "Set up");
+my $data = read_file("t/test.mail");
+# We need one mail:
+my @mails = Email::Store::Mail->retrieve_all;
+is(@mails, 1, "Only one mail");
+# is($mails[0]->message_id, '', "Correct ID");
+my @entities;
+ok(@entities = $mails[0]->named_entities );
+my %ent_map = map { lc($_->thing) => $_ } @entities;
+is ($ent_map{'switzerland'}->description,"place");
+is ($ent_map{'tony ageh'}->description,"person");

Added: trunk/Email-Store-NamedEntity/t/test.mail
--- trunk/Email-Store-NamedEntity/t/test.mail	2005-02-22 10:52:39 UTC (rev 1943)
+++ trunk/Email-Store-NamedEntity/t/test.mail	2005-02-22 11:02:30 UTC (rev 1944)
@@ -0,0 +1,49 @@
+From: Foo Bar <>
+To: Me <>
+Subject: Test Mail
+In November of 1994, I caught a flight to San Francisco with the Guardian's
+Tony Ageh, the designer Rik Gadsby, and the most terrifyingly efficient man
+I'd ever seen in my life. His name was Ian Stewart. He was one of the
+venture capitalists responsible for financing Wired in the US. While we
+waited for the flight, he arranged us all Executive Club cards so we'd
+never have to wait for another flight again. On the aeroplane, he reeled
+off the best places to dine in San Francisco, largely in inflected
+Japanese. He booked us into the most sumptuous hotel I could imagine, and
+then invited us to an 8.00 am "working breakfast" at his hotel, which was
+even grander. He spoke faster than Azeem on Speed, and twitched, as though
+he was hearing stock market reports being read out by a voice inside his
+head. Perhaps he was; it was impossible to tell. I had no guidelines: he
+was the first Wired person I had met.
+As it was, most of his tourist advice was wasted on me. I spent most of my
+San Fran nights in my room, staring like a refugee at the cable TV. I was
+not efficient. I was a slacker. Two months before I had been on the dole,
+as I had been for three years, moonlighting in a broken-down stand-up show
+about my BBS experiences. Then a man came up to me one night and asked me
+what MPEG did. I explained, and he in turn explained that his name was Tony
+Ageh and that he worked for the Guardian and he wanted to change the
+political system in this country  and utilising new technology seemed to be
+a good way of achieving this and he might be launching a UK version of
+Wired and would I like a job?
+And I said - and memory blurs whether these were my actual words at the
+time -  if this gives me a chance to go to San Francisco and watch Talk
+Soup and E! News and Mystery Science Theatre 3000 drunk at 3am in a black
+and pink decorated bedroom and get paid for it, why the hell not?
+I was taken along, because I was the only editorial person they had to show
+the Wired US team. I'm thinking "this Ian Stewart guy commutes to his
+London job from Switzerland, he got funding for Wired - the best fucking
+magazine on the planet, he is the extropian ubermensch, and *he's* just
+their money man. What are *they* going to be like?"
+He turns to me and says, 'I'm sure Louis will be very interested in what
+you have to say'.
+And I'm thinking: Now I am going to have to fake this efficiency thing,
+very, very seriously.
+Nice try. Two days later, I lost my passport.

Generated at 12:00 on 22 Feb 2005 by mariachi 0.52