Link to home
Start Free TrialLog in
Avatar of PeterBaileyUk
PeterBaileyUk

asked on

Sql Function to get the word position

I have the string  'A1 MK1 (8X) SPORTBACK TDI S LINE'

'A1' is the first word
'MK1' is the second word
'(8X)' is the third word
'SPORTBACK' is the fourth word
'TDI' is the fifth word
'S' is the sixth word
'LINE' is the seventh word

how do i extract that correctly_
I cant use charindex as that would fail searching for the 'S' as sportback starts with an 'S'

in vb I would have split the string into an array and the array index would be the word position.

how do i achieve that in sql server. i am in latest version
Avatar of Bill Prew
Bill Prew

Give this approach a try.

SELECT PARSENAME(REPLACE(YourColumnName, " ", "."), 1) AS 'Word1';  
SELECT PARSENAME(REPLACE(YourColumnName, " ", "."), 2) AS 'Word2';  
SELECT PARSENAME(REPLACE(YourColumnName, " ", "."), 3) AS 'Word3';  
SELECT PARSENAME(REPLACE(YourColumnName, " ", "."), 4) AS 'Word4';  
SELECT PARSENAME(REPLACE(YourColumnName, " ", "."), 5) AS 'Word5';  
SELECT PARSENAME(REPLACE(YourColumnName, " ", "."), 6) AS 'Word6';  
SELECT PARSENAME(REPLACE(YourColumnName, " ", "."), 7) AS 'Word7';  

Open in new window


»bp
Avatar of PeterBaileyUk

ASKER

The strings have differing quantity of words..This was just an example I found in the data.
So what are you trying to do?  Is this data in a column?  And what do you want to extract from it?  There is no easy way to return a variable number of columns in a query, at least not without it being some weird data manipulation query like TRANSFORM or PIVOT, and I'm not even sure then.

»bp
I am using split_string to split a string but with each word that was split i need the words position but not charindex. so the first word is always the first the second word the second and so on.

'My Full String'

'My'=pos 1 so need a 1 returned
'Full' was the second word so want a 2 returned and so on
ee.xlsx
This gives me the words split and the rows are in the correct way a1=1

Select ClientCodeWordPosition + '_'
   , [MVRIS CODE]
   , ClientName
   , UPPER(Word)
   , WordLength AS WordLen
   , ConcatDesc AS strfull
	, NoOfWords
   , MARQUE From 
(SELECT  
   smmt.[MVRIS CODE] + '_' + s.value + '_' AS ClientCodeWordPosition
   , SMMT.[MVRIS CODE]
   , 'SMMT' AS ClientName
   , s.value AS [Word]
   , LEN(s.VALUE) AS WordLength
   , CONCAT(ISNULL(RTRIM(LTRIM(SMMT.[Model Range])),''), CHAR(32), ISNULL(RTRIM(LTRIM(SMMT.[RANGE SERIES])),''), CHAR(32), ISNULL(RTRIM(LTRIM(SMMT.[VARIANT])),'')) AS ConcatDesc
   , count(*)  over (partition by SMMT.[MVRIS CODE]) NoOfWords
   , SMMT.MARQUE
FROM ClientData.[dbo].SMMT SMMT
   CROSS APPLY STRING_SPLIT(CONCAT(ISNULL(RTRIM(LTRIM(SMMT.[Model Range])),''), CHAR(32)
   , ISNULL(RTRIM(LTRIM(SMMT.[RANGE SERIES])),''), CHAR(32)
   , ISNULL(RTRIM(LTRIM(SMMT.[VARIANT])),'')), CHAR(32)) s 
  WHERE SMMT.[MVRIS CODE]='M3EHM' ) D

Open in new window


for each word i need its position so maybe a row counter would do it.
User generated imageee.xlsx
Use this function:
CREATE FUNCTION [dbo].[fnArray]
( 
	@str varchar(8000),
	@sep char(1)=' ',
	@i int=1
)	
RETURNS varchar(8000)
as
/*******************************************************************************\
Function	: fnArray

Purpose		: Parse a string in elements sepparated by a given 
			  character and return the element in position i
			  
Parameters	: @str - the string to parse
			  @sep - the separator character
			  @i   - the position of the element to be returned; 
					 if i<0 the function will return the number of elemets(size)
					 if i out of range the function will return NULL
Invoke		:
	
		select dbo.fnArray('elem1 elem2 elem3',' ',2) 
		select dbo.fnArray('elem1/elem2/elem3','/',3)
		select dbo.fnArray('elem1/elem2/elem3','/',-1)

Author		: AdrianBT - 2013-03-11		
\*******************************************************************************/
begin
	-- declare variables	
	declare 
		@size int,
		@count int=1,
		@pos_start int=0,
		@pos_end int=0

	-- get the size of the array
	select 
		-- first, replace multiple spaces with one, just in case
		@str=replace(replace(replace(rtrim(ltrim(@str)),'  ',' '+CHAR(7)),CHAR(7)+' ',''),CHAR(7),''),
		@size=LEN(@str)-LEN(replace(@str,@sep,''))+1

	-- if i negative return the size of the array
	if @i<0 
		-- return the size
		select
			@str=@size
	else	
	begin	

		-- if i out of range return null		
		if	@i not between 1 and @size 
				set @str=NULL
			else
			-- return the element in position i
			begin
			
				-- add an extra char to help with the last element
				select 
					@str=LTRIM(@str)+'^'
					
				-- loop the get the positions
				while @count<=@i
					select 
						@pos_start=@pos_end+1,
						@pos_end=CHARINDEX(@sep,@str,@pos_end+1),
						@count=@count+1
				
				-- isolate the element						
				select @str=	
					substring(
						@str, 
						@pos_start, 
						case 
							when @pos_end=0 then LEN(@str) 
							else @pos_end 
						end-@pos_start)
			end
	end
	
	-- return the element
	return @str
	
end	

Open in new window

This is how you use it:
declare
	@string varchar(1000)='A1 MK1 (8X) SPORTBACK TDI S LINE',
	@words_count int,
	@loop int=1

select 
	@words_count=[dbo].[fnArray](@string,' ',-1)

while @loop<=@words_count
begin
	select
		@loop as Position,
		[dbo].[fnArray](@string,' ',@loop) as Word
	select
		@loop=@loop+1
end

Open in new window

That definitely returns the correct word positions.  So its just a question of integration into the above query ID: 42151691
I think I asked you this question before. How do I use that ID: 42151691 to get to your question.
its at the top in this question
Select ClientCodeWordPosition + '_' + the word position from your function to build the PK
   , [MVRIS CODE]
   , ClientName
   , UPPER(Word)
   , WordLength AS WordLen
   , ConcatDesc AS strfull
	, NoOfWords
   , MARQUE From 
(SELECT  
   smmt.[MVRIS CODE] + '_' + s.value + '_' AS ClientCodeWordPosition
   , SMMT.[MVRIS CODE]
   , 'SMMT' AS ClientName
   , s.value AS [Word]
   , LEN(s.VALUE) AS WordLength
   , CONCAT(ISNULL(RTRIM(LTRIM(SMMT.[Model Range])),''), CHAR(32), ISNULL(RTRIM(LTRIM(SMMT.[RANGE SERIES])),''), CHAR(32), ISNULL(RTRIM(LTRIM(SMMT.[VARIANT])),'')) AS ConcatDesc
   , count(*)  over (partition by SMMT.[MVRIS CODE]) NoOfWords
   , SMMT.MARQUE
FROM ClientData.[dbo].SMMT SMMT
   CROSS APPLY STRING_SPLIT(CONCAT(ISNULL(RTRIM(LTRIM(SMMT.[Model Range])),''), CHAR(32)
   , ISNULL(RTRIM(LTRIM(SMMT.[RANGE SERIES])),''), CHAR(32)
   , ISNULL(RTRIM(LTRIM(SMMT.[VARIANT])),'')), CHAR(32)) s 
  WHERE SMMT.[MVRIS CODE]='M3EHM' ) D

Open in new window


This code gives as in the spreadsheet. maybe rownumber() thatsends the strfull to the function and returns the row where the words match with the position number I think but not sure maybe the query can be simplified now.
ee.xlsx
Show me what results do you expect based on the string in your example. From that query I am not sure what you want.
ASKER CERTIFIED SOLUTION
Avatar of Zberteoc
Zberteoc
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Herre is a more simple version of your query:
SELECT  
	  smmt.[MVRIS CODE] + '_' + s.Elem + '_' + cast(d.ElemID as varchar(5)) AS ClientCodeWordPosition
	, SMMT.[MVRIS CODE]
	, 'SMMT' AS ClientName
	, s.Elem AS [Word]
	, LEN(s.Elem) AS WordLength
	, ConcatDesc
	, count(*)  over (partition by SMMT.[MVRIS CODE]) NoOfWords
	, SMMT.MARQUE
FROM 
	(
		SELECT 
				[MVRIS CODE]	
			,[MARQUE]
			,ltrim(rtrim(
				CONCAT(
						ISNULL(RTRIM(LTRIM([Model Range]))+' ','')
					,ISNULL(RTRIM(LTRIM([RANGE SERIES]))+' ','')
					,ISNULL(RTRIM(LTRIM([VARIANT])),'')
					) 
				)) as ConcatDesc
		FROM
			ClientData.[dbo].SMMT
	) SMMT
	CROSS APPLY [dbo].[fnParseStringToSet](ConcatDesc,' ') s 
WHERE 
	SMMT.[MVRIS CODE]='M3EHM' 

Open in new window

I get an error
Msg 208, Level 16, State 1, Procedure fnParseStringToSet, Line 34 [Batch Start Line 48]
Invalid object name 'zb_dba_maint.dbo.fnTally'.

Open in new window

In the script with the 2 functions I gave you remove the database qualification:

zb_dba_maint. (dot included)

Just replace with empty space. That is obviously not the database name you use.
Run this script. It will drop and recreate the functions:
drop function [dbo].[fnTally]
GO
drop function [dbo].[fnParseStringToSet]
GO

CREATE FUNCTION [dbo].[fnTally]()
RETURNS TABLE --WITH SCHEMABINDING 
AS
/*******************************************************************************\
Function	: fnTally

Purpose		: returns a set with numbers from 1 to 10,000 
			  to be used in parsing and sequential data generation whithout loop
			  
Parameters	: no parameters

Invoke		:
	
		select * from [dbo].[fnTally]()
		select N from [dbo].[fnTally]()
		select substring('abcdef',N,1) as chr from [dbo].[fnTally]() where N<len('abcdef') -- parsing a string
		select dateadd(dd, N, '2007-01-01') as dte from [dbo].[fnTally]() --gets dates for about 30 years

Author		: AdrianBT - 2013-03-18		
\*******************************************************************************/
RETURN
	WITH 
	E1(N) AS 
	( --10E+1 or 10 rows
		 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
		 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
		 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
	),                         
   E2(N) AS 
   ( --10E+2 or 100 rows	
		SELECT 1 FROM E1 a, E1 b
	),
   E4(N) AS 
   ( --10E+4 or 10,000 rows max
		SELECT 1 FROM E2 a, E2 b
	)
			 SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) as N FROM E4
	;

GO


CREATE FUNCTION [dbo].[fnParseStringToSet]
(
	@str VARCHAR(8000), 
	@sep CHAR(1)
)
--WARNING!!! DO NOT USE MAX DATA-TYPES HERE!  IT KILLS PERFORMANCE!
RETURNS TABLE 
AS
/*******************************************************************************\
Function	: fnParseStringToSet

Purpose		: parses a string by a separator and returns a two columns set with
				each element and its position; uses the fnTally for parsing
			  
Parameters	: @str - the string to parse
			  @sep - the separator character

Invoke		:
	
		select * from [dbo].[fnParseStringToSet]('ab cd ef cd ef cd ef cd ef cd ef cd ef cd ef cd ef cd ef cd ef',' ')
		select tsk.ElemId, tsk.Elem as QATaskId,tme.Elem as EndTime from 
			[dbo].[fnParseStringToSet]('10,20,30,40,50',',') tsk
			left join [dbo].[fnParseStringToSet]('17:10,17:11,17:12',',') tme
				on tme.ElemId=tsk.ElemId

Author		: AdrianBT - 2013-03-18		
\*******************************************************************************/
RETURN
	with cteStart(N1) AS 
	(--returns N+1 (starting position of each "element" just once for each delimiter)
		SELECT 0 UNION ALL
		SELECT t.N+1 FROM [dbo].[fnTally]() t WHERE SUBSTRING(@str,t.N,1) = @sep and t.N<=len(@str)
	),
	cteLen(N1,L1) AS
	(--returns start and length (for use in substring)
		 SELECT s.N1,
				ISNULL(NULLIF(CHARINDEX(@sep,@str,s.N1),0)-s.N1,8000)
		   FROM cteStart s
	)
	--so the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
	SELECT 
		ROW_NUMBER() OVER(ORDER BY l.N1)	as ElemID,
		SUBSTRING(@str, l.N1, l.L1)			as Elem
	FROM 
		cteLen l
	;

GO

Open in new window

What is your SQL Server version?
You can try something like this.
create table test(col varchar(100))
insert into test values ('A1 MK1 (8X) SPORTBACK TDI S LINE')

select col,ltrim(SUBSTRING(col, n, CHARINDEX(' ', col + ' ',n) - n)) col_str,
       row_number() over (partition by col order by n) str_pos
  from test t1 
 cross join (SELECT number FROM master..spt_values WHERE type = 'P' and number > 0)  AS Numbers(n)
 WHERE SUBSTRING(' ' + col, n, 1) = ' '
   AND n < LEN(col) + 1
/*
  	col	col_str	str_pos
1	A1 MK1 (8X) SPORTBACK TDI S LINE	A1	1
2	A1 MK1 (8X) SPORTBACK TDI S LINE	MK1	2
3	A1 MK1 (8X) SPORTBACK TDI S LINE	(8X)	3
4	A1 MK1 (8X) SPORTBACK TDI S LINE	SPORTBACK	4
5	A1 MK1 (8X) SPORTBACK TDI S LINE	TDI	5
6	A1 MK1 (8X) SPORTBACK TDI S LINE	S	6
7	A1 MK1 (8X) SPORTBACK TDI S LINE	LINE	7

*/

Open in new window

Ive got somewhere and getting the output with
      select CONCAT(ISNULL(RTRIM(LTRIM(SMMT.[Model Range])),''), CHAR(32)
   , ISNULL(RTRIM(LTRIM(SMMT.[RANGE SERIES])),''), CHAR(32), ISNULL(RTRIM(LTRIM(SMMT.[VARIANT])),''))
   ,ltrim(SUBSTRING(CONCAT(ISNULL(RTRIM(LTRIM(SMMT.[Model Range])),''), CHAR(32), ISNULL(RTRIM(LTRIM(SMMT.[RANGE SERIES])),''), CHAR(32)
   , ISNULL(RTRIM(LTRIM(SMMT.[VARIANT])),'')), n, CHARINDEX(' ', CONCAT(ISNULL(RTRIM(LTRIM(SMMT.[Model Range])),''), CHAR(32)
   , ISNULL(RTRIM(LTRIM(SMMT.[RANGE SERIES])),''), CHAR(32), ISNULL(RTRIM(LTRIM(SMMT.[VARIANT])),'')) + ' ',n) - n)) col_str,
       row_number() over (partition by SMMT.[MVRIS CODE] order by n) str_pos, smmt.[MVRIS CODE]
FROM ClientData.[dbo].SMMT SMMT

 cross join (SELECT number FROM master..spt_values WHERE type = 'P' and number > 0)  AS Numbers(n)
 WHERE SUBSTRING(' ' + CONCAT(ISNULL(RTRIM(LTRIM(SMMT.[Model Range])),''), CHAR(32), ISNULL(RTRIM(LTRIM(SMMT.[RANGE SERIES])),''), CHAR(32)
 , ISNULL(RTRIM(LTRIM(SMMT.[VARIANT])),'')), n, 1) = ' '
   AND n < LEN(CONCAT(ISNULL(RTRIM(LTRIM(SMMT.[Model Range])),''), CHAR(32), ISNULL(RTRIM(LTRIM(SMMT.[RANGE SERIES])),''), CHAR(32)
   , ISNULL(RTRIM(LTRIM(SMMT.[VARIANT])),''))) + 1 AND SMMT.[MVRIS CODE]='1B1AD'

Open in new window


Ive added a spreadsheet of the output as I guess its highlighted a typical dupe and spaces can it dedupe at the same time?
ee.xlsx
Have you tried my functions ans query I gave? What you posted above seems way off from you original query. Doing manipulations on the fly is not recommended. That query is a nightmare to maintain.