Link to home
Start Free TrialLog in
Avatar of WaldaInc
WaldaInc

asked on

SQL Syntax

Hi. I have a field in a table called CityAndState.  It contains one of the following for data

CityName <space> StateName
 
or

CityName <comma> StateName

Now, I have code that will parse it during the select and break it up into separate values for City and State which looks like the code blelow.

HOWEVER, it doesn't work with city names that have more than one space. For example, Sacramento CA it works fine with. San Francisco CA it does not.

Any suggestions on how to look for a SECOND occurence of the space?

Thanks in advance


CASE WHEN 
	CHARINDEX(',', ri.CityAndState) > 0 THEN RTRIM(LTRIM(LEFT(ri.CityAndState, CHARINDEX(',', ri.CityAndState + ' ') - 1)))
     ELSE
	CASE WHEN
		CHARINDEX(' ', ri.CityAndState) > 0 THEN RTRIM(LTRIM(LEFT(ri.CityAndState, CHARINDEX(' ', ri.CityAndState + ' ') - 1)))
	     ELSE
		ri.CityAndState
	END
END AS CITY,
CASE WHEN 
	CHARINDEX(' ', ri.CityAndState) > 0 THEN RTRIM(LTRIM(SUBSTRING(ri.CityAndState, CHARINDEX(' ',ri.CityAndState) + 1, LEN(ri.CityAndState)))) 
     ELSE 
	NULL 
END AS STATE

Open in new window

Avatar of Swindle
Swindle
Flag of United States of America image

Try using the REPLACE function to get rid of the commas so you are always looking for a space.  That's the first step.  Then use the REVERSE function to flip the string around and find the last space which you know will always be the  one between the city and the state.  Here's a test to show you want I'm talking about along with what I think  you would need to put in your query to get the result you want.
--Example to test with.  Just plug in different values to @CityAndState to test results
DECLARE @CityAndState as varchar(100)
SET @CityAndState = 'San Francisco, CA'
 
SELECT LEFT(REPLACE(@CityAndState, ', ',' '), len(REPLACE(@CityAndState, ', ',' ')) - CHARINDEX(' ', REVERSE(REPLACE(@CityAndState, ', ',' '))) + 1) as City,
	RIGHT(REPLACE(@CityAndState, ', ',' '),CHARINDEX(' ', REVERSE(REPLACE(@CityAndState, ', ',' '))) - 1) as State
 
--This is what I think you actually need in your statement
SELECT LEFT(REPLACE(ri.CityAndState, ', ',' '), len(REPLACE(ri.CityAndState, ', ',' ')) - CHARINDEX(' ', REVERSE(REPLACE(ri.CityAndState, ', ',' '))) + 1) as City,
	RIGHT(REPLACE(ri.CityAndState, ', ',' '),CHARINDEX(' ', REVERSE(REPLACE(ri.CityAndState, ', ',' '))) - 1) as State

Open in new window

Avatar of WaldaInc
WaldaInc

ASKER

Sorry, but those did not work at all. I'm getting everything returined basically as the CityAndState column, it does not split out the values at all. It does however, remove the comma.

The code I had did work for getting comma seperated or space seperated CityAndState, it just didn't work for the extra space city names.
Thy this way out...

select 
 
cityandstate
,case when charindex(',',cityandstate)>0 
     then left(cityandstate, len(cityandstate)-charindex(',',reverse(cityandstate)))
     else left(cityandstate, len(cityandstate)-charindex(' ',reverse(cityandstate)))
end
,case when charindex(',',cityandstate)>0 
     then ltrim(right(cityandstate, charindex(',',reverse(cityandstate))-1))
     else ltrim(right(cityandstate, charindex(' ',reverse(cityandstate))-1))
end
 
from (select 'san francisco, ca' as CityAndState
union all select 'new york NY'
union all select 'york, pa'
union all select 'sacramento ca') ri

Open in new window

Thjanks Brandon.

Your code is nice and it works within the context of your post. However, when I put the code into my SQL Statement I get the following error:

Invalid length parameter passed to the LEFT or SUBSTRING function.


This is coming from the len(cityandstate) portion as when I replace that portion with a hard-coded value it works.

Now is my code I am doing the len on the actual column name from the database. Not sure if that is what is casuing the issue.
If you have data that does not have a space or a , then that will happen.  I mistakenly assumed it always would.

This should fix it.
select 
 
cityandstate
,case when charindex(',',cityandstate)>0 then left(cityandstate, len(cityandstate)-charindex(',',reverse(cityandstate)))
when charindex(' ',cityandstate)>0 then left(cityandstate, len(cityandstate)-charindex(' ',reverse(cityandstate)))
     else cityandstate
end
,case when charindex(',',cityandstate)>0  then ltrim(right(cityandstate, charindex(',',reverse(cityandstate))-1))
when charindex(',',cityandstate)>0  then ltrim(right(cityandstate, charindex(' ',reverse(cityandstate))-1))
     else null
end
 
from (select 'san francisco, ca' as CityAndState
union all select 'new york NY'
union all select 'york, pa'
union all select 'sacramento ca'
union all select 'pittsburgh') ri

Open in new window

Nope same error as before. Also, your code run as it is, doesn't return a 'STATE' value for New York NY. It comes out as NULL.

It seems that SQL does not line the second parameter for Left to be a calcualation for example Len(something) - somethingelse.  It chokes on that. Once I remove the - (minus) portion, it runs. Weird, I know VB and C# support that and you would assume that SQL does too, but apparently not.
ASKER CERTIFIED SOLUTION
Avatar of BrandonGalderisi
BrandonGalderisi
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I'm confused.

Your code (the latest) works just fine.  However when I use your code intermixed with mine, I get the error.

It's on this line, because if I remove this line (replace it with hard-coded value) it runs.

CASE WHEN CHARINDEX(',', ri.CityAndState) > 0
THEN left(ri.CityAndState, len(ri.CityAndState)-charindex(',',reverse(ri.CityAndState)))

Below is the entire SQL statement, but it is the THEN branch that is erroring out. Error message is:

Invalid length parameter passed to the LEFT or SUBSTRING function.
SELECT LastName, ISNULL(RTRIM(LTRIM(LEFT(ri.FirstandMiddle, CHARINDEX(' ', ri.FirstandMiddle + ' ') - 1))), '') AS FirstName, 
ISNULL(CASE WHEN CHARINDEX(' ', ri.FirstandMiddle) = 0 THEN NULL ELSE RTRIM(LTRIM(SUBSTRING(ri.FirstandMiddle, CHARINDEX(' ',ri.FirstandMiddle) + 1, LEN(ri.FirstandMiddle)))) END, '') AS MiddleName, ri.SSN, ri.AltID AS AltID, ri.PersonID, 
ri.SalaryTotal, ri.SerialNumber, ri.BasedOnSalary, ri.PlusSalary, 
ri.PayrollUnit, ri.ApptTenure, ri.AnniDate, ri.AgencyCode, ri.Schematic, ri.RollCode, ri.Shift, 
ri.CountyCode, ri.AddressLine1, ri.CityAndState, ri.FirstandMiddle, ri.ZipCode, 
 
CASE WHEN CHARINDEX(',', ri.CityAndState) > 0 
		--THIS IS THE LINE THAT CAUSES ERROR. Replace the THEN branch with something like: THEN 'test'  and it works fine.
		THEN left(ri.CityAndState, len(ri.CityAndState)-charindex(',',reverse(ri.CityAndState)))
		--RTRIM(LTRIM(LEFT(ri.CityAndState, CHARINDEX(',', ri.CityAndState + ' ') - 1)))
	 ELSE
		CASE WHEN
			CHARINDEX(' ', ri.CityAndState) > 0 THEN RTRIM(LTRIM(LEFT(ri.CityAndState, CHARINDEX(' ', ri.CityAndState + ' ') - 1)))
		ELSE
			ri.CityAndState
		END
END AS CITY,
CASE WHEN
		CHARINDEX(',', ri.CityAndState) > 0 THEN RTRIM(LTRIM(SUBSTRING(ri.CityAndState, CHARINDEX(',',ri.CityAndState) + 1, LEN(ri.CityAndState))))
	 ELSE
		CASE WHEN 
				CHARINDEX(' ', ri.CityAndState) > 0 THEN RTRIM(LTRIM(SUBSTRING(ri.CityAndState, CHARINDEX(' ',ri.CityAndState) + 1, LEN(ri.CityAndState)))) 
			 ELSE NULL 
		END 
END AS STATE
 
 
FROM FI_Temp_RosterImport ri
INNER JOIN FI_Rules r ON ri.RulesID = r.ID
WHERE	ri.JobID = 19
AND r.type = 'C'

Open in new window

Here is another tidbit.

Original THEN branch is:

CASE WHEN CHARINDEX(',', ri.CityAndState) > 0
THEN left(ri.CityAndState, len(ri.CityAndState)-charindex(',',reverse(ri.CityAndState)))

Replace that with:

CASE WHEN CHARINDEX(',', ri.CityAndState) > 0
THEN left(ri.CityAndState, charindex(',',reverse(ri.CityAndState)))

And it runs fine (not correct results, but no error message). Also, using this works:

CASE WHEN CHARINDEX(',', ri.CityAndState) > 0
THEN left(ri.CityAndState, len(ri.CityAndState))

It just doesn't seem to like the Len(value) -  anothervalue portion.  That doesn't make sense to me, unless there is an invalid value coming back. But CHARINDEX return 0 if nothing found correct? And that shouldn't matter as we got to that branch BECAUSE we found the comma.




Ok.  Something didn't get copied correctly.

Because mine returns
new york|      NY|

Yours returns:
new|      york NY


And I cannot figure out what it could be.  There has to be a value in there that I'm not accounting for.  I, in my sample data, have tried:

","
""
" "
",something"
",something,"
" something"
"something "
"some,thing"

and nothing produces the error.  How many records are we talking about?
How many records? The result set returns 239,000+ records.  There are 300,000+ in the table.

It is really weird. Just looking at that line of code, nothing stands out. By the way, below is the output from your code. It works fine as far as I can see:

san francisco, ca      san francisco      ca
new york NY      new york      NY
york, pa      york      pa
sacramento ca      sacramento      ca
pittsburgh      pittsburgh      NULL
Give me a few minutes, I think I have the reason. Be back soon.

Nope, no luck. It really doesn't like this:

len(ri.CityAndState)-charindex(',',reverse(ri.CityAndState)))

Now, I added some debugging code at the end of my select:

len(ri.CityAndState) AS LEN_CITY,
charindex(',',reverse(ri.CityAndState)) AS CHAR_INDEX

For one of the rows, these returned values of 10, 19 (respectively). I thought 10 - 19 is a negative number lets flop those like this:

left(ri.CityAndState, charindex(',',reverse(ri.CityAndState)) - len(ri.CityAndState))

But I get the same error. By the way, putting hardcoded integers in there works. (i.e 19 - 10).


Ok, I think I found the problem.  This data in inserted into a temp table via a Bulk Insert, then copied over to the table I'm using. There appear to be just a bunch of spaces on the right and sometimes on the left of the data. I inserted a buch of LTRIm and RTRIM and that seems to work. Scanning the data I only see one row that comes out wrong and that row has no space OR comma in the field. Garbage in Garbage out I guess.

Code below
CASE WHEN CHARINDEX(',', ri.CityAndState) > 0 
	THEN 
		CASE WHEN LEN(ri.CityAndState) > 0
			THEN 
				RTRIM(LTRIM(LEFT(ri.CityAndState, CHARINDEX(',', ri.CityAndState + ' ') - 1)))			
			ELSE
				NULL
		END
	ELSE
		CASE WHEN CHARINDEX(' ', ri.CityAndState) > 0 
			THEN
				LTRIM(RTRIM(LEFT(ri.CityAndState, len(ri.CityAndState) - CHARINDEX(' ', reverse(RTRIM(ri.CityAndState)) ))))
			ELSE
				LTRIM(RTRIM(ri.CityAndState))
		END
END AS CITY,
 
 
CASE WHEN CHARINDEX(',', ri.CityAndState) > 0 
	THEN
		RTRIM(LTRIM(SUBSTRING(ri.CityAndState, CHARINDEX(',',ri.CityAndState) + 1, LEN(ri.CityAndState))))
	ELSE
		CASE WHEN CHARINDEX(' ', ri.CityAndState) > 0 
			THEN 
				rtrim(ltrim(right(rtrim(ri.CityAndState), charindex(' ',reverse(rtrim(ri.CityAndState))))))
			ELSE 
				NULL 
		END 
END AS STATE

Open in new window

Given you the points, you got me most of the way there and if I'd known about the spaces (i.e. posted that) you probably would have realized the issue sooner than I did.

Thanks
That makes sense.  That is due to the fact of how len() works.  Len gives you the length of the string, minus any trailing spaces.  So what was happening if the number of characters from the end the comma existed at + amount of white space was equal to or greater than the number of characters - trailing spaces, you would get the error.

select len('test'),len('test   '),len('    test   '),len('   test')

The simple solution is to use datalength and not len().  Or, to trim the fields.  The reason I don't always jump for datalength() over len() is that datalength counts the number of bytes whereas len() counts the number of characters.  So if you have a unicode string, you have to do datalength(field)/2 to get the number of characters (including spaces).

Below is an updated solution using datalength that does not REQUIRE trimming.  Keep in mind, that if the fields are unicode (nvarchar) and not ascii (varchar) then you will have to add /2 after the datalength function to get the correct result.
select 
 
cityandstate
,case when charindex(',',cityandstate)>0 then left(cityandstate, datalength(cityandstate)-charindex(',',reverse(cityandstate)))
when charindex(' ',cityandstate)>0 then left(cityandstate, datalength(cityandstate)-charindex(' ',reverse(cityandstate)))
     else cityandstate
end
,case when charindex(',',cityandstate)>0  then ltrim(right(cityandstate, charindex(',',reverse(cityandstate))-1))
when charindex(' ',cityandstate)>0  then ltrim(right(cityandstate, charindex(' ',reverse(cityandstate))-1))
     else null
end
 ,
CASE WHEN CHARINDEX(',', ri.CityAndState) > 0 
                --THIS IS THE LINE THAT CAUSES ERROR. Replace the THEN branch with something like: THEN 'test'  and it works fine.
                THEN left(ri.CityAndState, datalength(ri.CityAndState)-charindex(',',reverse(ri.CityAndState)))
                --RTRIM(LTRIM(LEFT(ri.CityAndState, CHARINDEX(',', ri.CityAndState + ' ') - 1)))
         ELSE
                CASE WHEN
                        CHARINDEX(' ', ri.CityAndState) > 0 THEN RTRIM(LTRIM(LEFT(ri.CityAndState, CHARINDEX(' ', ri.CityAndState + ' ') - 1)))
                ELSE
                        ri.CityAndState
                END
END AS CITY,
CASE WHEN
                CHARINDEX(',', ri.CityAndState) > 0 THEN RTRIM(LTRIM(SUBSTRING(ri.CityAndState, CHARINDEX(',',ri.CityAndState) + 1, datalength(ri.CityAndState))))
         ELSE
                CASE WHEN 
                                CHARINDEX(' ', ri.CityAndState) > 0 THEN RTRIM(LTRIM(SUBSTRING(ri.CityAndState, CHARINDEX(' ',ri.CityAndState) + 1, datalength(ri.CityAndState)))) 
                         ELSE NULL 
                END 
END AS STATE
 
from (select 'san francisco, ca' as CityAndState
union all select 'new york, NY                      '
union all select 'york, pa'
union all select 'sacramento ca'
union all select 'pittsburgh'
union all select '') ri

Open in new window