dsr1811
asked on
US Postal Address Parsing into seperate Fields
I need a PHP/Regular Expression to find the parts of a US postal address using php, preferably returned in an array, identified by key values, so i can feed into a Like Search in MySQL. See Examples of the possible input and the output below. This is a challenging problem. Thanx in advance. Please, if you know of a better way to do this by all means let me know. MySQL's boolean search looked interesting for this type of problem. THANKS AGAIN!!
Possible input fields are: Number, Direction, Name, Suffix, City, State, Zip, but direction could come on the either side of number.
Case does not matter, fields may be seperated by commas or spaces
Suffixes need to be validated and abreaiations should be replaced by the full names:
STREET|ST|DRIVE|DR|AVENUE| AVE|ROAD|R D|COURT|CT |CIRCLE|LA NE|LN|BOUL EVARD|BLVD
Compass directions need to be validated and abreaiations should be replaced by the full names:
W|West|SW|Southwest|NW|Nor thwest|S|S outh|E|Eas t|SE|South east|NE|No rtheast|N| North
Cities and streets may contain more than one word.
Currently the User input is turned into an array
The output should also be an array, where I will then feed to a class that will create the mySql 'AND' logic for the fields.
array(
[number] =>421
[direction] =>'west'
[city] =>'john glenn'
[state] =>'ca'
)
Example of the solution array applied to a query:
WHERE number like '%421%' AND direction like '%west%' AND city like '%john glenn%' AND state like 'ca'
I am currently creating an array with the input and seperating as below:
$addressParts = explode(",", str_replace(" ",",",$PostAddr));
INPUT: las vegas, nv
OUTPUT: city =>las vegas, state => nv
INPUT: 90210
OUTPUT: zip => 90210
INPUT: 910 hamilton 90210
OUTPUT: number => 910, name => hamilton, zip => 90210
INPUT: 910 hamilton ave 90210
OUTPUT: number => 910, name => hamilton, suffix => avenue, zip => 90210
INPUT: 220 hamilton john glenn ca
OUTPUT: number => 220, name => hamilton, city => john glenn, state => ca
INPUT: 421 w 14th st john glenn ca
OUTPUT: number => 421, direction => west, name => 14th, suffix => street, city => john glenn, state => ca
INPUT: 220 hamilton john glenn
OUTPUT: number => 220, name => hamilton, city => john glenn
INPUT: 910 hamilton ave, campbell, ca
OUTPUT: number => 910, name => hamilton, suffix => avenue, city => campbell, state => ca
INPUT: w hamilton ln, john glenn, ca 90210
OUTPUT: direction => west, name => hamilton, suffix => lane, city => john glenn, state => ca, zip => 90210
INPUT: w hamilton ave john glenn ca 90210
OUTPUT: direction => west, name => hamilton, suffix => avenue, city => john glenn, state => ca, zip => 90210
Possible input fields are: Number, Direction, Name, Suffix, City, State, Zip, but direction could come on the either side of number.
Case does not matter, fields may be seperated by commas or spaces
Suffixes need to be validated and abreaiations should be replaced by the full names:
STREET|ST|DRIVE|DR|AVENUE|
Compass directions need to be validated and abreaiations should be replaced by the full names:
W|West|SW|Southwest|NW|Nor
Cities and streets may contain more than one word.
Currently the User input is turned into an array
The output should also be an array, where I will then feed to a class that will create the mySql 'AND' logic for the fields.
array(
[number] =>421
[direction] =>'west'
[city] =>'john glenn'
[state] =>'ca'
)
Example of the solution array applied to a query:
WHERE number like '%421%' AND direction like '%west%' AND city like '%john glenn%' AND state like 'ca'
I am currently creating an array with the input and seperating as below:
$addressParts = explode(",", str_replace(" ",",",$PostAddr));
INPUT: las vegas, nv
OUTPUT: city =>las vegas, state => nv
INPUT: 90210
OUTPUT: zip => 90210
INPUT: 910 hamilton 90210
OUTPUT: number => 910, name => hamilton, zip => 90210
INPUT: 910 hamilton ave 90210
OUTPUT: number => 910, name => hamilton, suffix => avenue, zip => 90210
INPUT: 220 hamilton john glenn ca
OUTPUT: number => 220, name => hamilton, city => john glenn, state => ca
INPUT: 421 w 14th st john glenn ca
OUTPUT: number => 421, direction => west, name => 14th, suffix => street, city => john glenn, state => ca
INPUT: 220 hamilton john glenn
OUTPUT: number => 220, name => hamilton, city => john glenn
INPUT: 910 hamilton ave, campbell, ca
OUTPUT: number => 910, name => hamilton, suffix => avenue, city => campbell, state => ca
INPUT: w hamilton ln, john glenn, ca 90210
OUTPUT: direction => west, name => hamilton, suffix => lane, city => john glenn, state => ca, zip => 90210
INPUT: w hamilton ave john glenn ca 90210
OUTPUT: direction => west, name => hamilton, suffix => avenue, city => john glenn, state => ca, zip => 90210
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks for the response guys!!
And i know this is a real teaser
I am trying to do something similar to the Realtor.com address search.
I have the live search for the postal code and city worked out as they do, but when someone types in an address that does not match either I need to try to construct the best search i can based in the info provided.
After determining they have not typed in a matching zip or city first, then we know we need to interrogate each array item and based on what we know about what the individual address items could look like look, we apply a set of regexs against each of them and pop them from the array.
i.e.
If the user would type 220 w summit 90210, the array components would first be tested against a zip code pattern.
$zipcode_pattern = '/^([0-9]{5})(-[0-9]{4})?$ /';
Pop the zip code and then test:
1. State abbreviation (none)
2. Direction (pop the w and convert to west)
3. Street number (can only test for numeric, so pop 220)
4. and so on
This was my line of thinking and maybe part of my question is, is this the best approach?
I found a couple of regex patterns, that i thought would ascertain the sub parts of the address, but no luck. See below.
$full_address_pattern = '/^\s*((?:(?:\d+(?:\x20+\w +\.?)+(?:( ?:\x20+STR EET|ST|DRI VE|DR|AVEN UE|AVE|ROA D|RD|LOOP| COURT|CT|C IRCLE|LANE |LN|BOULEV ARD|BLVD)\ .?)?)|(?:( ?:P\.\x20? O\.|P\x20? O)\x20*Box \x20+\d+)| (?:General \x20+Deliv ery)|(?:C[ \\\/]O\x20 +(?:\w+\x2 0*)+))\,?\ x20*(?:(?: (?:APT|BLD G|DEPT|FL| HNGR|LOT|P IER|RM|S(? :LIP|PC|T( ?:E|OP))|T RLR|UNIT|\ x23)\.?\x2 0*(?:[a-zA -Z0-9\-]+) )|(?:BSMT| FRNT|LBBY| LOWR|OFC|P H|REAR|SID E|UPPR))?) \,?\s+((?: (?:\d+(?:\ x20+\w+\.? )+(?:(?:\x 20+STREET| ST|DRIVE|D R|AVENUE|A VE|ROAD|RD |LOOP|COUR T|CT|CIRCL E|LANE|LN| BOULEVARD| BLVD)\.?)? )|(?:(?:P\ .\x20?O\.| P\x20?O)\x 20*Box\x20 +\d+)|(?:G eneral\x20 +Delivery) |(?:C[\\\/ ]O\x20+(?: \w+\x20*)+ ))\,?\x20* (?:(?:(?:A PT|BLDG|DE PT|FL|HNGR |LOT|PIER| RM|S(?:LIP |PC|T(?:E| OP))|TRLR| UNIT|\x23) \.?\x20*(? :[a-zA-Z0- 9\-]+))|(? :BSMT|FRNT |LBBY|LOWR |OFC|PH|RE AR|SIDE|UP PR))?)?\,? \s+((?:[A- Za-z]+\x20 *)+)\,\s+( A[LKSZRAP] |C[AOT]|D[ EC]|F[LM]| G[AU]|HI|I [ADLN]|K[S Y]|LA|M[AD EHINOPST]| N[CDEHJMVY ]|O[HKR]|P [ARW]|RI|S [CD]|T[NX] |UT|V[AIT] |W[AIVY])\ s+(\d+(?:- \d+)?)\s*$ /';
$full_address_pattern1 = '/^(?n:(?(\d{1,5}(\ 1\/[234])?(\x20[A-Z]([a-z] )+)+ )|(P\.O\.\ Box\ \d{1,5}))\s{1,2}(?i:(?(((A PT|B LDG|DEPT|FL|HNGR|LOT|PIER| RM|S(LIP|P C|T(E|OP)) |TRLR|UNIT )\x20\w{1, 5})|(BSMT| FRNT|LBBY| LOWR|OFC|P H|REAR|SID E|UPPR)\.? )\s{1,2})? )(?[A-Z]([ a-z])+(\.? )(\x20[A-Z ]([a-z])+) {0,2})\, \x20(?A[LKSZRAP]|C[AOT]|D[ EC]|F[LM]| G[AU]|HI|I [ADL N]|K[SY]|LA|M[ADEHINOPST]| N[CDEHJMVY ]|O[HKR]|P [ARW]|RI|S [CD] |T[NX]|UT|V[AIT]|W[AIVY])\ x20(?(?!0{ 5})\d{5}(- \d {4})?))$/';
Thanks again.
And i know this is a real teaser
I am trying to do something similar to the Realtor.com address search.
I have the live search for the postal code and city worked out as they do, but when someone types in an address that does not match either I need to try to construct the best search i can based in the info provided.
After determining they have not typed in a matching zip or city first, then we know we need to interrogate each array item and based on what we know about what the individual address items could look like look, we apply a set of regexs against each of them and pop them from the array.
i.e.
If the user would type 220 w summit 90210, the array components would first be tested against a zip code pattern.
$zipcode_pattern = '/^([0-9]{5})(-[0-9]{4})?$
Pop the zip code and then test:
1. State abbreviation (none)
2. Direction (pop the w and convert to west)
3. Street number (can only test for numeric, so pop 220)
4. and so on
This was my line of thinking and maybe part of my question is, is this the best approach?
I found a couple of regex patterns, that i thought would ascertain the sub parts of the address, but no luck. See below.
$full_address_pattern = '/^\s*((?:(?:\d+(?:\x20+\w
$full_address_pattern1 = '/^(?n:(?(\d{1,5}(\ 1\/[234])?(\x20[A-Z]([a-z]
Thanks again.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Since this is not a question, so much as a need for application development, let's try a different approach to be productive. Please tell us what you're trying to accomplish and maybe we can suggest a well-known design pattern.
Best, ~Ray