mytilig
asked on
Parsing and extracting data from an email through perl
Hello-
I have a need to read all the emails that come to a particular mailbox, parse it out, and retreive some data out and save it into a database.
I am planning to use the Mail::POP3Client module to parse the email. I can insert it into the db as well.
What I need help is with the data extraction.
I want to know a neat way to extract data out. For eg in the sample here, I want to insert Joe into the first name field in the database, Smith as last name, similary Address1, 2 etc. City state zip is more tricky because they appear on the same line. there is comma between city and state and space between state and zip.
In addition, I want to save the complete conclusion part of it as well. This is the last part of the email and appears with the heading "CONCLUSION"
The data I receive will consistenly be in this format all the time. Are there any perl modules that will help this kind of extraction or should I be getting good at my regular expressions?
Sample email that I receive:
-------------------------- ---------- ---------- ----
From
Return-Path:
Received:
Date:
Message-ID:
To:
etc
hello here is the info blah blah...
Name: Joe Smith
Address: Address line1
Address line2 (optional)
city, state zip
Country of Residence: United States
Primary Phone: xxx-xxx-xxxx
Secondary Phone:
Email Address: email
******CONCLUSION******
Thank you for the oppurtunity blah blah
-------------------------- ---------- ---------- ----
I have a need to read all the emails that come to a particular mailbox, parse it out, and retreive some data out and save it into a database.
I am planning to use the Mail::POP3Client module to parse the email. I can insert it into the db as well.
What I need help is with the data extraction.
I want to know a neat way to extract data out. For eg in the sample here, I want to insert Joe into the first name field in the database, Smith as last name, similary Address1, 2 etc. City state zip is more tricky because they appear on the same line. there is comma between city and state and space between state and zip.
In addition, I want to save the complete conclusion part of it as well. This is the last part of the email and appears with the heading "CONCLUSION"
The data I receive will consistenly be in this format all the time. Are there any perl modules that will help this kind of extraction or should I be getting good at my regular expressions?
Sample email that I receive:
--------------------------
From
Return-Path:
Received:
Date:
Message-ID:
To:
etc
hello here is the info blah blah...
Name: Joe Smith
Address: Address line1
Address line2 (optional)
city, state zip
Country of Residence: United States
Primary Phone: xxx-xxx-xxxx
Secondary Phone:
Email Address: email
******CONCLUSION******
Thank you for the oppurtunity blah blah
--------------------------
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hi Teraplane,
The Lingua::EN::AddressParse did not work correctly for me. Can you tell me what I am doing wrong?
Here is my code:
-------------------------- ---------- ---------- ---------- ---------- ---------- ----------
my %args = ( country => 'United States',
auto_clean => 1, force_case => 1,
abbreviate_subcountry => 0,
abbreviated_subcountry_onl y => 1 );
my $address = new Lingua::EN::AddressParse(% args);
$error = $address->parse("3430, warburton avenue, Apt. No. xx,Maywood Apts,Vallejo, CA 95000 ");
%my_address = $address->components;
$suburb = $my_address{suburb};
$correct_casing = $address->case_all;
foreach $comp ( sort keys %my_address)
{
print "$comp\t\t: $my_address{$comp}\n";
}
-------------------------- ---------- ---------- ---------- ---------- ---------- ----------
Output I receive is
country :
post_box :
post_code :
property_identifier :
property_name :
street :
street_type :
subcountry :
suburb :
The Lingua::EN::AddressParse did not work correctly for me. Can you tell me what I am doing wrong?
Here is my code:
--------------------------
my %args = ( country => 'United States',
auto_clean => 1, force_case => 1,
abbreviate_subcountry => 0,
abbreviated_subcountry_onl
my $address = new Lingua::EN::AddressParse(%
$error = $address->parse("3430, warburton avenue, Apt. No. xx,Maywood Apts,Vallejo, CA 95000 ");
%my_address = $address->components;
$suburb = $my_address{suburb};
$correct_casing = $address->case_all;
foreach $comp ( sort keys %my_address)
{
print "$comp\t\t: $my_address{$comp}\n";
}
--------------------------
Output I receive is
country :
post_box :
post_code :
property_identifier :
property_name :
street :
street_type :
subcountry :
suburb :
Address parsing is a very complex business, as there are a huge number of valid syntax patterns. This module can cope with many, but not all. I reaaranged your data as follows and it parsed OK. Also, it was developed with Australian addresses. I never really got the US style of street num, street name, apartment number, suburb working properly.
$error = $address->parse("Apt 2, 3430 warburton avenue, Vallejo, CA 95000 ");
All you can really do is filter out the mis matches for hand processing
$error = $address->parse("Apt 2, 3430 warburton avenue, Vallejo, CA 95000 ");
All you can really do is filter out the mis matches for hand processing
ASKER
I see that you re-arranged just the apt number and street. City, State and Zip is same as the order I had, still I did not get any output in post_code, or suburb(if that is where state should go)
I asked this because Address1 and Address2 are low priority for me. I will be somewhat ok if I can succesfully parse city state and zip.
I do not have control on the input format, that website is owned by someone else and I am just tasked to parsing the fields out.
That would mean I cannot use this module, correct?
I tried Suzanne's code above, but it did not parse correctly for majority of the input either...
I asked this because Address1 and Address2 are low priority for me. I will be somewhat ok if I can succesfully parse city state and zip.
I do not have control on the input format, that website is owned by someone else and I am just tasked to parsing the fields out.
That would mean I cannot use this module, correct?
I tried Suzanne's code above, but it did not parse correctly for majority of the input either...
with the input of "Apt 2, 3430 warburton avenue, Vallejo, CA 95000 " the output is
country:
post_box:
post_code: 95000
property_identifier: Apt 2 3430
property_name:
street: warburton
street_type: avenue
subcountry: CA
suburb: Vallejo
But if the parser gets confused with the start of the address, it cannot recover to retrieve city and post code. Is all your data in this format, or can it handle other cases? If the parser cannot handle most of your data, you may be better of writing your own rexexp, or adapting the grammer I have just to parse city state and zip.
country:
post_box:
post_code: 95000
property_identifier: Apt 2 3430
property_name:
street: warburton
street_type: avenue
subcountry: CA
suburb: Vallejo
But if the parser gets confused with the start of the address, it cannot recover to retrieve city and post code. Is all your data in this format, or can it handle other cases? If the parser cannot handle most of your data, you may be better of writing your own rexexp, or adapting the grammer I have just to parse city state and zip.
ASKER
Yes all my data is in this format.
-------------------------- ---------- ---------- -----
hello here is the info blah blah...
Name: Joe Smith
Address: Address line1
Address line2 (optional)
city, state zip
Country of Residence: United States
Primary Phone: xxx-xxx-xxxx
Secondary Phone:
Email Address: email
******CONCLUSION******
Thank you for the oppurtunity blah blah
-------------------------- ---------- ---------- -----
And I have only US addresses.
Thanks so much for your time!
--------------------------
hello here is the info blah blah...
Name: Joe Smith
Address: Address line1
Address line2 (optional)
city, state zip
Country of Residence: United States
Primary Phone: xxx-xxx-xxxx
Secondary Phone:
Email Address: email
******CONCLUSION******
Thank you for the oppurtunity blah blah
--------------------------
And I have only US addresses.
Thanks so much for your time!
I was referring to the Address line1 and 2 components. Is there always an apartment after the street name? My guess is that some of your address data can be parsed and some will not fit the patterns it can cope with, in which case you could apply another regexp to try and salvage city, post code and state.
http://search.cpan.org/~sderle/Geo-StreetAddress-US-0.99/US.pm might do what you want
ASKER
use Geo::StreetAddress::US;
my $hashref = Geo::StreetAddress::US->pa rse_locati on(
"1005 Gravenstein Hwy N, Sebastopol CA 95472" );
From this reference to a hash, how do I retreive the address specifiers?
For eg how what is the variable that holds the zip code?
my $hashref = Geo::StreetAddress::US->pa
"1005 Gravenstein Hwy N, Sebastopol CA 95472" );
From this reference to a hash, how do I retreive the address specifiers?
For eg how what is the variable that holds the zip code?
print("Zip is:",$hashref->{zip});
print("City is:",$hashref->{city});
print("City is:",$hashref->{city});
ASKER
Did try it but does not work.
print $hashref->{'zip'} ;
just prints a blank..
print $hashref->{'zip'} ;
just prints a blank..
ASKER
my input is
my $hashref = Geo::StreetAddress::US->pa rse_locati on("1005 Gravenstein Hwy N, Sebastopol CA 95472" );
my $hashref = Geo::StreetAddress::US->pa
I ran your exact code and it prints 95472 for zip code. What version of perl are you running?
ASKER
I have perl 5.6.1
Looks like my module might not have got installed correctly.
I re-downloaded the module and installed it once again, I got some errors for make test. I do remember seeing them the last time I installed them too. Guess I ignored it since my scripts were compiling fine.
Pasted below. Any thoughts on why the module would not install correctly?
C:\Temp\Geo-StreetAddress- US-0.99>pe rl makefile.PL
Writing Makefile for Geo::StreetAddress::US
C:\Temp\Geo-StreetAddress- US-0.99>c: \perl\bin\ nmake
Microsoft (R) Program Maintenance Utility Version 1.50
Copyright (c) Microsoft Corp 1988-94. All rights reserved.
C:\Temp\Geo-StreetAddress- US-0.99>c: \perl\bin\ nmake test
Microsoft (R) Program Maintenance Utility Version 1.50
Copyright (c) Microsoft Corp 1988-94. All rights reserved.
C:\Perl\bin\perl.exe -Mblib -IC:\Perl\lib -IC:\Perl\lib -e "use Test::Harness qw(&
runtests $verbose); $verbose=0; runtests @ARGV;" t\01_parser.t
Using C:/Temp/Geo-StreetAddress- US-0.99/bl ibt\01_par ser....Usi ng C:/Temp/Geo-StreetAddr
ess-US-0.99/blib
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 2# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 3# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 4# Structures begin differing at:
# $got->{zip} = undef
# $expected->{zip} = '95472'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 5# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 6# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 7# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 8# Structures begin differing at:
# $got->{zip} = undef
# $expected->{zip} = '95472'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 9# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 10# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'UT'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 11# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'DC'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 12# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'PA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 13# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 14# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 15# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 16# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 17# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 18# Structures begin differing at:
# $got->{zip} = undef
# $expected->{zip} = '95472'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 19# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 20# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 21# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 22# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'DC'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 23# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 24# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 25# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 26# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 27# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'MN'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 28# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 29# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 30# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 31# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 32# Structures begin differing at:
# $got->{zip} = undef
# $expected->{zip} = '95472'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 33# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Looks like you failed 32 tests of 38.
t\01_parser....dubious
Test returned status 32 (wstat 8192, 0x2000)
DIED. FAILED tests 2-33
Failed 32/38 tests, 15.79% okay
Failed Test Stat Wstat Total Fail Failed List of Failed
-------------------------- ---------- ---------- ---------- ---------- ---------- ---
t\01_parser.t 32 8192 38 32 84.21% 2-33
Failed 1/1 test scripts, 0.00% okay. 32/38 subtests failed, 15.79% okay.
NMAKE : fatal error U1077: 'C:\WINDOWS\system32\cmd.e xe' : return code '0x2'
Stop.
C:\Temp\Geo-StreetAddress- US-0.99>c: \perl\bin\ nmake install
Microsoft (R) Program Maintenance Utility Version 1.50
Copyright (c) Microsoft Corp 1988-94. All rights reserved.
Skipping C:\Perl\site\lib\Geo\Stree tAddress\U S.pm (unchanged)
Writing C:\Perl\site\lib\auto\Geo\ StreetAddr ess\US\.pa cklist
Appending installation info to C:\Perl\lib/perllocal.pod
Looks like my module might not have got installed correctly.
I re-downloaded the module and installed it once again, I got some errors for make test. I do remember seeing them the last time I installed them too. Guess I ignored it since my scripts were compiling fine.
Pasted below. Any thoughts on why the module would not install correctly?
C:\Temp\Geo-StreetAddress-
Writing Makefile for Geo::StreetAddress::US
C:\Temp\Geo-StreetAddress-
Microsoft (R) Program Maintenance Utility Version 1.50
Copyright (c) Microsoft Corp 1988-94. All rights reserved.
C:\Temp\Geo-StreetAddress-
Microsoft (R) Program Maintenance Utility Version 1.50
Copyright (c) Microsoft Corp 1988-94. All rights reserved.
C:\Perl\bin\perl.exe -Mblib -IC:\Perl\lib -IC:\Perl\lib -e "use Test::Harness qw(&
runtests $verbose); $verbose=0; runtests @ARGV;" t\01_parser.t
Using C:/Temp/Geo-StreetAddress-
ess-US-0.99/blib
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 2# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 3# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 4# Structures begin differing at:
# $got->{zip} = undef
# $expected->{zip} = '95472'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 5# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 6# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 7# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 8# Structures begin differing at:
# $got->{zip} = undef
# $expected->{zip} = '95472'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 9# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 10# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'UT'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 11# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'DC'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 12# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'PA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 13# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 14# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 15# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 16# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 17# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 18# Structures begin differing at:
# $got->{zip} = undef
# $expected->{zip} = '95472'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 19# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 20# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 21# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 22# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'DC'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 23# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 24# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 25# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 26# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 27# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'MN'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 28# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 29# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 30# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 31# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 32# Structures begin differing at:
# $got->{zip} = undef
# $expected->{zip} = '95472'
# Failed test (t\01_parser.t at line 366)
t\01_parser....NOK 33# Structures begin differing at:
# $got->{state} = undef
# $expected->{state} = 'CA'
# Looks like you failed 32 tests of 38.
t\01_parser....dubious
Test returned status 32 (wstat 8192, 0x2000)
DIED. FAILED tests 2-33
Failed 32/38 tests, 15.79% okay
Failed Test Stat Wstat Total Fail Failed List of Failed
--------------------------
t\01_parser.t 32 8192 38 32 84.21% 2-33
Failed 1/1 test scripts, 0.00% okay. 32/38 subtests failed, 15.79% okay.
NMAKE : fatal error U1077: 'C:\WINDOWS\system32\cmd.e
Stop.
C:\Temp\Geo-StreetAddress-
Microsoft (R) Program Maintenance Utility Version 1.50
Copyright (c) Microsoft Corp 1988-94. All rights reserved.
Skipping C:\Perl\site\lib\Geo\Stree
Writing C:\Perl\site\lib\auto\Geo\
Appending installation info to C:\Perl\lib/perllocal.pod
ASKER
use Geo::StreetAddress::US;
my $hashref = Geo::StreetAddress::US->pa rse_locati on("1005 Gravenstein Hwy N, Sebastopol CA 95472" );
while ( my ($key, $value) = each(%$hashref) ) {
print "$key => $value\n";
}
prints
state =>
zip =>
suffix =>
city =>
prefix =>
type =>
number =>
street =>
So looks like none of the keys have any values. I did try other address format from the doc for the module , but same result.
my $hashref = Geo::StreetAddress::US->pa
while ( my ($key, $value) = each(%$hashref) ) {
print "$key => $value\n";
}
prints
state =>
zip =>
suffix =>
city =>
prefix =>
type =>
number =>
street =>
So looks like none of the keys have any values. I did try other address format from the doc for the module , but same result.
I have something that is sort of old, that I quickly converted to do what you are wanting.
use DBI;
my $files = 'email.txt';
my $start = 0;
my $lines = 0;
my $found = 0;
my $datas = '';
my @parts;
my @ms_db = ( 'localhost', '3306', 'database_name', 'user_name', 'user_password', 'insert_table_name' );
open FILE, '<' . $files;
while ( <FILE> )
{
tr/\r\n\t//d;
if ( $start != 1 && m/^$/ )
{
$start = 1;
}
elsif ( $start == 1 )
{
s/\:/\: /g;
if ( m/^Name\:/ )
{
s/^.*?\s+|\s+$//g;
if ( m/\s/ )
{
my @name = split(/\s/, $_);
push @parts,$name[0];
push @parts,$name[1];
}
else
{
push @parts,$_;
push @parts,'';
}
}
elsif ( m/^Address\:/|| $lines > 0 )
{
s/^.*?\s+|\s+$//g;
if ( $lines == 0 )
{
push @parts,$_;
$lines++;
}
elsif ( $lines == 1 )
{
push @parts,$_;
$lines++;
}
else
{
$lines = 0;
foreach my $addy (split(/\s/, $_))
{
if ( $addy =~ m/^[0-9]{5}$/ )
{
push @parts,$addy;
}
elsif ( $addy =~ m/^.*?\,$/ )
{
push @parts,substr ( $addy, 0, -1 );
}
else
{
push @parts,$addy;
}
}
}
}
elsif ( m/Residence\:|Phone\:|Emai
{
s/^.*?\:\s+|\s+$//g;
push @parts,$_;
}
else
{
if ( m/^.*?CONCLUSION.*?$/ )
{
$found = 1;
}
elsif ( $found )
{
$datas .= $_ . "\n";
}
}
}
}
close FILE;
push @parts,$datas;
my $obj = &make_connection;
$obj->do("INSERT INTO " . $ms_db[5] . " (id,first_name,last_name,a
$obj->disconnect();
sub make_connection
{
return DBI->connect("dbi:mysql:" . $ms_db[2] . ":" . $ms_db[0] . ":" . $ms_db[1], $ms_db[3], $ms_db[4]);
}
Suzanne