We help IT Professionals succeed at work.

Parsing Data

Junior Vasquez
on
I want to get everything from the first character to the word United States, for example:

if the address is

123 Grand Concourse, Bronx, NY 10451 United States

i want everything but the United States

It would be easy to trim from the right, but sometimes, they use the word US, or USA.... That's why i couldn't use the replace function.
Comment
Watch Question

CERTIFIED EXPERT

Commented:
left(YourOrignalString,charindex ('United States',YourOriginalString)-1)

or you can try

left(YourOrignalString,len(YourOrignalString)-charindex ('U',reverse(YourOriginalString))-1)
Senior DBA
CERTIFIED EXPERT
Most Valuable Expert 2018
Distinguished Expert 2019
Commented:

I'd use a CROSS APPLY to do the searches, then a COALESCE to find the correct part of the original string to pull, something like this:


SELECT
    address AS original_address,
    LEFT(address, COALESCE(NULLIF(us1, 0), NULLIF(us2, 0), NULLIF(us3, 0),
        NULLIF(us4, 0)) - 1) AS address_minus_usa
FROM ( VALUES
    ('123 Grand Concourse, Bronx, NY 10451 United States'),
    ('234 Broad Street, Fusion Junction, CO 22222 USA'),
    ('345 Any Cove, Somewhere, MT 33333 U.S.'),
    ('456 ZZZ, YYY, ID 44444 US'),
    ('999 Other, Berlin, Germany')
) AS sample_data(address)
CROSS APPLY (
    SELECT CHARINDEX('United Sta', address) AS us1,
        PATINDEX('%[^A-Z]USA%', address) AS us2,
        PATINDEX('%[^A-Z]U.S.%', address) AS us3,
        PATINDEX('%[^A-Z]US[^A-Z]%', address + '/') AS us4
) AS ca1


CERTIFIED EXPERT

Commented:
Great Code Scott
since you are considering specific cases
I Would include a 5th one as U.S.A.
and any others that people may have used in your DB (cannot think of more)

and maybe  including lowercase scenarios.
of course it would be ideal if you do not let people input the country this way, if possible.
Junior VasquezIT Specialist

Author

Commented:

Thank you guys, and also, this data comes from XML files from an external source, so we don't have any control over it. We have a look table that we verify the addresses on, which is in NYC open data. So it's the only way I could think of to match the addresses. 

Scott PletcherSenior DBA
CERTIFIED EXPERT
Most Valuable Expert 2018
Distinguished Expert 2019

Commented:

@arana


Thank you.  


>> I Would include a 5th one as U.S.A. <<

Good point, and it is already covered by the us3 pattern.

Junior VasquezIT Specialist

Author

Commented:

Arana,


Question


I've used this to do a Join


 select left(ArrestLocation,len(ArrestLocation)-charindex ('U',reverse(ArrestLocation))-1) as Address,
 ArrestLocation, ArrestingAgency, ArrestLocationPSA, ArrestLocationDistrict
 from ArrestInfo


and this does everything I asked for. Now, if I wanted to also scrub out city and state, in my case Bronx, NY how could I do that? I tried to do it, but have come up short, so everything from city, state to the end of the string. 


thanks. 

CERTIFIED EXPERT
Commented:
@Junior: my code was a quick "solution" to your specific question, although its a tiny bit faster I strongly suggest you to use Scott approach instead as it has more validations, since my code would also work if the country is any of :Australia, Austria, Belarus, Cyprus, Mauritius, Russia  and maybe you don't want that, unless of course there are never other countries in the data, and all records are USA.


>>if I wanted to also scrub out city and state, in my case Bronx, NY how could I do that? I tried to do it, but have come up short, so everything from city, state to the end of the string. <<
Not sure I understand what you want but here is what I think you want (considering only Bronx and nothing else):

select left(ArresLocation,PATINDEX('%[^0-9A-z]Bronx%',ArresLocation)-1-(PATINDEX('%[A-Za-z0-9]%',REVERSE(left(ArresLocation,PATINDEX('%[^0-9A-z]Bronx%',ArresLocation)-1)))))

Open in new window


that will result in:
"123 Grand Concourse"
I added something else after the first "-1" , to take care of any extra commas.
Junior VasquezIT Specialist

Author

Commented:

I actually used a combination of both your's and Scott's solutions. It works very well, thanks for your help. Till next time, happy Friday. 


Junior

Explore More ContentExplore courses, solutions, and other research materials related to this topic.