[Webinar] Streamline your web hosting managementRegister Today

  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1048
  • Last Modified:

How Do I Retrieve Unit from Geo::StreetAddress::US Perl Module, Possible module modification required?

I am tasked with parsing a good deal of addresses (first batch is 47,000) into individual components so that they can later be imported into an SQL database.  I started with a CPAN search and found a few promising Perl modules.  Of those Geo::StreetAddress::US appeared to be the best fit.  Of the ones I tested it was the most accurate and also the fastest.  I also played with Lingua::EN::AddressParse but it failed to properly parse some addresses and took about 3 times as long as Geo::StreetAddress::US.

My problem is that I can return all of the data I need except for the Unit (apt, bldg, etc).  I don't think this was a priority for the module's author as it was being used to find lat/long for http://geocoder.us .  I've attached the bit of code I am using for testing and it returns everything I need except the unit.  Am I making an error in my code or will I need to edit the module to make it do what I want?  Or am I just plain missing something?

I've also attached a copy of the module, though it is also available from cpan.org.  My Perl is beginner level at best and I'm having trouble following the flow of all the hashes and any edits I've made haven't made any difference.

Thanks in advance,

use Geo::StreetAddress::US;
$address = Geo::StreetAddress::US->parse_location( "1492 N Columbus BLVD APT 4,PORTLAND,OR,97203" );
print "number: " . $address->{'number'} . "\n";
print "prefix:  " . $address->{'prefix'} . "\n";
print "street:  " . $address->{'street'} . "\n";
print "type:  " . $address->{'type'} . "\n";
print "suffix:  " . $address->{'suffix'} . "\n";
print "city:  " . $address->{'city'} . "\n";
print "state: " . $address->{'state'} . "\n";
print "zip: " . $address->{'zip'} . "\n";
print "unit: " . $address->{'unit'} . "\n";

Open in new window

1 Solution
The regex matches the unit, but doesn't save it.  You can make a change to the module code so it saves the unit:
On line 683-701, there is a definition of the hash %Addr_Match, with line 700 having the definition for unit.
If you change line 700 to this, you should get the unit:

unit    => qr/((?:(?:su?i?te|p\W*[om]\W*b(?:ox)?|dept|apt|ro*m|fl|apt|unit|box)\W+|#\W*)[\w-]+)(?{$_{unit} = $^N})/i,

Open in new window

telwestAuthor Commented:
Thank you.  That worked beautifully.

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now