[prev] [thread] [next] [lurker] [Date index for 2005/02/22]
Author: simon Date: 2005-02-22 11:02:30 +0000 (Tue, 22 Feb 2005) New Revision: 1944 Added: trunk/Email-Store-NamedEntity/ trunk/Email-Store-NamedEntity/Build.PL trunk/Email-Store-NamedEntity/Changes trunk/Email-Store-NamedEntity/MANIFEST trunk/Email-Store-NamedEntity/MANIFEST.SKIP trunk/Email-Store-NamedEntity/README trunk/Email-Store-NamedEntity/lib/ trunk/Email-Store-NamedEntity/lib/Email/ trunk/Email-Store-NamedEntity/lib/Email/Store/ trunk/Email-Store-NamedEntity/lib/Email/Store/NamedEntity.pm trunk/Email-Store-NamedEntity/t/ trunk/Email-Store-NamedEntity/t/01.t trunk/Email-Store-NamedEntity/t/test.mail Log: Import an new (old) project Added: trunk/Email-Store-NamedEntity/Build.PL =================================================================== --- trunk/Email-Store-NamedEntity/Build.PL 2005-02-22 10:52:39 UTC (rev 1943) +++ trunk/Email-Store-NamedEntity/Build.PL 2005-02-22 11:02:30 UTC (rev 1944) @@ -0,0 +1,18 @@ +use strict; +use Module::Build; + +my $build = Module::Build + ->new( module_name => "Email::Store::NamedEntity", + license => 'perl', + requires => { + 'Email::Store' => 0, + 'File::Slurp' => 0, + 'Test::More' => 0, + 'Lingua::EN::NamedEntity' => 0, + 'DBD::SQLite2' => 0, + }, + create_makefile_pl => 'traditional', + ); + +$build->create_build_script; + Added: trunk/Email-Store-NamedEntity/Changes =================================================================== --- trunk/Email-Store-NamedEntity/Changes 2005-02-22 10:52:39 UTC (rev 1943) +++ trunk/Email-Store-NamedEntity/Changes 2005-02-22 11:02:30 UTC (rev 1944) @@ -0,0 +1,27 @@ +Sun Jul 4 13:32:48 BST 2004 v1.3 +----------------------------------------------- +Eeek - 1.2 didn't compile. Stupid Simon. + + + +Mon Jun 21 09:54:46 BST 2004 v1.2 +----------------------------------------------- +Patch from Simon Cozens which should make things +slightly saner if it can't find all the necessary +hidden dictionary files. + + + +Sat Jun 19 18:21:17 BST 2004 v1.01 +----------------------------------------------- +Well, it would have been the last release if I hadn't +been a muppet and forgotten an 's' and something out +of the MANIFEST. + + + + +Thu Jun 17 19:17:49 BST 2004 v1.0 +----------------------------------------------- +First release. Unlikely to need another, I hope. + Added: trunk/Email-Store-NamedEntity/MANIFEST =================================================================== --- trunk/Email-Store-NamedEntity/MANIFEST 2005-02-22 10:52:39 UTC (rev 1943) +++ trunk/Email-Store-NamedEntity/MANIFEST 2005-02-22 11:02:30 UTC (rev 1944) @@ -0,0 +1,9 @@ +Build.PL +Makefile.PL +MANIFEST This list of files +README +META.yml +Changes +lib/Email/Store/NamedEntity.pm +t/01.t +t/test.mail Added: trunk/Email-Store-NamedEntity/MANIFEST.SKIP =================================================================== --- trunk/Email-Store-NamedEntity/MANIFEST.SKIP 2005-02-22 10:52:39 UTC (rev 1943) +++ trunk/Email-Store-NamedEntity/MANIFEST.SKIP 2005-02-22 11:02:30 UTC (rev 1944) @@ -0,0 +1,12 @@ +\.svn +\.bak +~$ +\.tar\.gz$ +\.mail +MANIFEST.SKIP +_build +blib +Build$ +^TODO$ +t/test.db +entest Added: trunk/Email-Store-NamedEntity/README =================================================================== --- trunk/Email-Store-NamedEntity/README 2005-02-22 10:52:39 UTC (rev 1943) +++ trunk/Email-Store-NamedEntity/README 2005-02-22 11:02:30 UTC (rev 1944) @@ -0,0 +1,56 @@ +NAME + Email::Store::NamedEntities - Provides a list of named entities for an + email + +INSTALL + The usual : + + % perl Build.PL + % ./Build + % ./Build test + % sudo ./Build install + + or, if you don't have Module::Build + + % perl Makefile.PL + % make + % make test + % sudo make install + + +SYNOPSIS + Remember to create the database table: + + % make install + % perl -MEmail::Store="..." -e 'Email::Store->setup' + + And now: + + foreach my $e ($mail->named_entities) { + print $e->thing," which is a ", $e->description,"(score=",$e->score(),")\n"; + } + +DESCRIPTION + This extension for "Email::Store" adds the "named_entity" table, and + exports the "named_entities" method to the "Email::Store::Mail" class + which returns a list of "Email::Store::NamedEntity" objects. + + A "Email::Store::NamedEntity" object has three fields - + + thing + The entity we've extracted e.g "Bob Smith" or "London" w + + description + What class of entity it is e.g "person", "organisation" or "place" + + score + How likely like it is to be that class. + +SEE ALSO + Email::Store::Mail, Lingua::EN::NamedEntity. + +AUTHOR + Simon Wistow, "simon@xxxxxxxxxx.xxx" + + This module is distributed under the same terms as Perl itself. + Added: trunk/Email-Store-NamedEntity/lib/Email/Store/NamedEntity.pm =================================================================== --- trunk/Email-Store-NamedEntity/lib/Email/Store/NamedEntity.pm 2005-02-22 10:52:39 UTC (rev 1943) +++ trunk/Email-Store-NamedEntity/lib/Email/Store/NamedEntity.pm 2005-02-22 11:02:30 UTC (rev 1944) @@ -0,0 +1,129 @@ +package Email::Store::NamedEntity; +use 5.006; +use strict; +use warnings; +our $VERSION = '1.3'; +use Email::Store::DBI; +use base 'Email::Store::DBI'; +use Email::Store::Mail; + + +Email::Store::NamedEntity->table("named_entity"); +Email::Store::NamedEntity->columns(All => qw/id mail thing description score/); +Email::Store::NamedEntity->columns(Primary => qw/id/); +Email::Store::NamedEntity->has_a(mail => "Email::Store::Mail"); +Email::Store::Mail->has_many( named_entities => "Email::Store::NamedEntity" ); + + + +sub on_store_order { 80 } + +sub on_store { + my ($self, $mail) = @_; + my $simple = $mail->simple; + require Lingua::EN::NamedEntity; + + foreach my $e (Lingua::EN::NamedEntity::extract_entities($simple->body)) + { + + my $class = $e->{class}; + my $score = $e->{scores}->{$class} || 0; + Email::Store::NamedEntity->create({ + mail => $mail->id, + thing => $e->{entity}, + description => $class, + score => $score, + }); + } +} + +sub on_gather_plucene_fields_order { 80 } + +# Bet you weren't expecting that! +sub on_gather_plucene_fields { + my ($self, $mail, $hash) = @_; + + my %topics; + foreach my $e ($mail->named_entities) { + push @{$topics{lc($e->description)}}, lc($e->thing); + } + + foreach my $key (keys %topics) { + $hash->{$key} = join ' ', @{$topics{$key}}; + } + +} + +=head1 NAME + +Email::Store::NamedEntity - Provides a list of named entities for an email + +=head1 SYNOPSIS + +Remember to create the database table: + + % make install + % perl -MEmail::Store="..." -e 'Email::Store->setup' + +And now: + + foreach my $e ($mail->named_entities) { + print $e->thing," which is a ", $e->description,"(score=",$e->score(),")\n"; + } + +=head1 DESCRIPTION + +C<Named entities> is the NLP jargon for proper nouns which represent people, +places, organisations, and so on. Clearly this is useful meta data to extract +from a body of emails. + +This extension for C<Email::Store> adds the C<named_entity> table, and exports +the C<named_entities> method to the C<Email::Store::Mail> class which returns +a list of C<Email::Store::NamedEntity> objects. + +A C<Email::Store::NamedEntity> object has three fields - + +=over 4 + +=item thing + +The entity we've extracted e.g "Bob Smith" or "London" w + +=item description + +What class of entity it is e.g "person", "organisation" or "place" + +=item score + +How likely like it is to be that class. + +=back + +C<Email::Store::NamedEntity> will also attempt to index each field +so that if you ahve the C<Email::Store::Plucene> module installed +then you could search using something like + + place:London + + +=head1 SEE ALSO + +L<Email::Store::Mail>, L<Lingua::EN::NamedEntity>. + +=head1 AUTHOR + +Simon Wistow, C<simon@xxxxxxxxxx.xxx> + +This module is distributed under the same terms as Perl itself. + +=cut + +1; +__DATA__ +CREATE TABLE named_entity ( + id int AUTO_INCREMENT NOT NULL PRIMARY KEY, + mail varchar(255), + thing varchar(255), + description varchar(60), + score float(4,2) +); Added: trunk/Email-Store-NamedEntity/t/01.t =================================================================== --- trunk/Email-Store-NamedEntity/t/01.t 2005-02-22 10:52:39 UTC (rev 1943) +++ trunk/Email-Store-NamedEntity/t/01.t 2005-02-22 11:02:30 UTC (rev 1944) @@ -0,0 +1,24 @@ +use Test::More tests => 5; +use File::Slurp; +BEGIN { unlink("t/test.db"); } +use Email::Store "dbi:SQLite2:dbname=t/test.db"; +Email::Store->setup; +ok(1, "Set up"); + +my $data = read_file("t/test.mail"); +Email::Store::Mail->store($data); + +# We need one mail: +my @mails = Email::Store::Mail->retrieve_all; +is(@mails, 1, "Only one mail"); +# is($mails[0]->message_id, '20001128211546.A29664@xxx.xxx', "Correct ID"); + +my @entities; + +ok(@entities = $mails[0]->named_entities ); + +my %ent_map = map { lc($_->thing) => $_ } @entities; + +is ($ent_map{'switzerland'}->description,"place"); +is ($ent_map{'tony ageh'}->description,"person"); + Added: trunk/Email-Store-NamedEntity/t/test.mail =================================================================== --- trunk/Email-Store-NamedEntity/t/test.mail 2005-02-22 10:52:39 UTC (rev 1943) +++ trunk/Email-Store-NamedEntity/t/test.mail 2005-02-22 11:02:30 UTC (rev 1944) @@ -0,0 +1,49 @@ +From: Foo Bar <foo@xxx.xxx> +To: Me <simon@xxxxxxxx.xxx> +Subject: Test Mail +Message-ID: 20001128211546.A29664@xxx.xxx + + +In November of 1994, I caught a flight to San Francisco with the Guardian's +Tony Ageh, the designer Rik Gadsby, and the most terrifyingly efficient man +I'd ever seen in my life. His name was Ian Stewart. He was one of the +venture capitalists responsible for financing Wired in the US. While we +waited for the flight, he arranged us all Executive Club cards so we'd +never have to wait for another flight again. On the aeroplane, he reeled +off the best places to dine in San Francisco, largely in inflected +Japanese. He booked us into the most sumptuous hotel I could imagine, and +then invited us to an 8.00 am "working breakfast" at his hotel, which was +even grander. He spoke faster than Azeem on Speed, and twitched, as though +he was hearing stock market reports being read out by a voice inside his +head. Perhaps he was; it was impossible to tell. I had no guidelines: he +was the first Wired person I had met. + +As it was, most of his tourist advice was wasted on me. I spent most of my +San Fran nights in my room, staring like a refugee at the cable TV. I was +not efficient. I was a slacker. Two months before I had been on the dole, +as I had been for three years, moonlighting in a broken-down stand-up show +about my BBS experiences. Then a man came up to me one night and asked me +what MPEG did. I explained, and he in turn explained that his name was Tony +Ageh and that he worked for the Guardian and he wanted to change the +political system in this country and utilising new technology seemed to be +a good way of achieving this and he might be launching a UK version of +Wired and would I like a job? + +And I said - and memory blurs whether these were my actual words at the +time - if this gives me a chance to go to San Francisco and watch Talk +Soup and E! News and Mystery Science Theatre 3000 drunk at 3am in a black +and pink decorated bedroom and get paid for it, why the hell not? + +I was taken along, because I was the only editorial person they had to show +the Wired US team. I'm thinking "this Ian Stewart guy commutes to his +London job from Switzerland, he got funding for Wired - the best fucking +magazine on the planet, he is the extropian ubermensch, and *he's* just +their money man. What are *they* going to be like?" + +He turns to me and says, 'I'm sure Louis will be very interested in what +you have to say'. + +And I'm thinking: Now I am going to have to fake this efficiency thing, +very, very seriously. + +Nice try. Two days later, I lost my passport.
Generated at 12:00 on 22 Feb 2005 by mariachi 0.52