Link to home
Start Free TrialLog in
Avatar of bfuchs
bfuchsFlag for United States of America

asked on

how to eliminate duplicates in a string variable in t-sql?

Hi Experts,

Lets say I have a variable called MySTR and has the following value 'A very long string, testing the string'
since the word string is duplicate I would like to eliminate it, and just get the following 'A very long string, testing the '
Avatar of Pawan Kumar
Pawan Kumar
Flag of India image

Try....

DECLARE @str AS VARCHAR(1000) = 'A very long string, testing the string'

DECLARE @Xml XML
DECLARE @Removed_Duplicate VARCHAR(1000)

SET @Xml = CAST(('<P>'+REPLACE(@str,' ','</P><P>')+'</P>') AS XML)
 
;WITH CTE AS 
(
	SELECT
	ROW_NUMBER() OVER(ORDER BY (SELECT 1)) rnk,
	a.value('.', 'VARCHAR(1000)') col
	FROM @Xml.nodes('P') pn(a) 
)

SELECT @Removed_Duplicate =
(
	SELECT STUFF((SELECT '' + ' ' + '' + c.col FROM CTE c
	LEFT JOIN CTE c1 
		ON c.rnk+1=c1.rnk
	WHERE (c.col<>c1.col OR c.col is NULL)
	FOR XML PATH('') ),1,1,'')
)
 
SELECT @Removed_Duplicate RemovedDuplicateWords

Open in new window


Ouptut

RemovedDuplicateWords
A very long string, testing the
Avatar of Ryan Chong
@Pawan,

your solution seems got issue if we have this?

DECLARE @str AS VARCHAR(1000) = 'A very long stringx, testing the string'

Open in new window


@bfuchs,
do you want to remove a repeated word or is that possible to have multiple words that repeated, like this? do you want to check the upper/lowercase?
A very A long stringx, d D testing the string

Open in new window

Avatar of bfuchs

ASKER

@Pawan,,

This looks like too much code and too complicate for me to support in the long run, wondering if there can be something simpler?

@Ryan,
I want to assign the non duplicate words to another variable, case does not matter.

ps. nothing should be repeated, the second A should also be removed.

Thanks,
Ben
Hi,

Will change to some simple logic.

@Ryan - Thanks Bro.
I like Pawan's suggestion, and it can be written like this:
DECLARE @str AS VARCHAR(1000) = 'A very A long stringx, d D testing the string'

DECLARE @Xml XML
DECLARE @Removed_Duplicate VARCHAR(1000)

SET @Xml = CAST(('<P>'+REPLACE(@str,' ','</P><P>')+'</P>') AS XML)
 
;WITH CTE AS 
(
	SELECT
	ROW_NUMBER() OVER(ORDER BY (SELECT 1)) rnk,
	ROW_NUMBER() OVER(partition by a.value('.', 'VARCHAR(1000)') ORDER BY (SELECT 1)) idx,
	a.value('.', 'VARCHAR(1000)') col
	FROM @Xml.nodes('P') pn(a) 
)

select @Removed_Duplicate = stuff(
(
select ' ' + col from CTE where idx = 1 order by rnk for xml path('')
),1,1,'')

print @str
print @Removed_Duplicate

Open in new window


the SplitString and ReassembleString functions as mentioned in Daniel's link should have worked too which has similar concept
Hi Ryan - the code you posted is giving error << where idx = 1>>

@Author - This is the best way to handle this.. We can create a function to manage the code going forward..

Here is the complete code..

--

DECLARE @str AS VARCHAR(1000) = 'A very long stringx, testing the string'

DECLARE @Xml XML
DECLARE @Removed_Duplicate VARCHAR(1000)=''

SET @Xml = CAST(('<P>'+REPLACE(@str,' ','</P><P>')+'</P>') AS XML)
 
;WITH CTE AS 
(
	SELECT
	ROW_NUMBER() OVER(ORDER BY (SELECT 1)) rnk,
	a.value('.', 'VARCHAR(1000)') col
	FROM @Xml.nodes('P') pn(a) 
)
,CTE1 AS
(
	SELECT * FROM
	(
		SELECT rnk, col , ROW_NUMBER() OVER (PARTITION BY CASE WHEN CHARINDEX(',',col,1) > 0 THEN REPLACE(col,',','') ELSE col END ORDER BY rnk) cols FROM CTE
	)k WHERE cols = 1
)
select @Removed_Duplicate = STUFF((SELECT ' ' + col from CTE1 ORDER BY rnk FOR XML PATH('')),1,1,'')

SELECT @str OriginalString ,  @Removed_Duplicate RemovedDuplicateString

--

Open in new window


Run - 1

OriginalString                                              RemovedDuplicateString
A very long stringx, testing the string      A very long stringx, testing the string

Run - 2

OriginalString                                               RemovedDuplicateString
A very long string, testing the string       A very long string, testing the


Function

CREATE FUNCTION fn_RemoveDuplicate
(
	@str VARCHAR(1000)
)
RETURNS VARCHAR(1000)

AS 

BEGIN

	DECLARE @Xml XML
	DECLARE @Removed_Duplicate VARCHAR(1000)=''

	SET @Xml = CAST(('<P>'+REPLACE(@str,' ','</P><P>')+'</P>') AS XML)
 
	;WITH CTE AS 
	(
		SELECT
		ROW_NUMBER() OVER(ORDER BY (SELECT 1)) rnk,
		a.value('.', 'VARCHAR(1000)') col
		FROM @Xml.nodes('P') pn(a) 
	)
	,CTE1 AS
	(
		SELECT * FROM
		(
			SELECT rnk, col , ROW_NUMBER() OVER (PARTITION BY CASE WHEN CHARINDEX(',',col,1) > 0 THEN REPLACE(col,',','') ELSE col END ORDER BY rnk) cols FROM CTE
		)k WHERE cols = 1
	)
	select @Removed_Duplicate = STUFF((SELECT ' ' + col from CTE1 ORDER BY rnk FOR XML PATH('')),1,1,'')

	RETURN @Removed_Duplicate

END

Open in new window


Usage -

SELECT dbo.fn_RemoveDuplicate('A very long stringx, testing the string')

Open in new window


Output
--------
A very long stringx, testing the string


Hope it helps !!
@Pawan,
the code you posted is giving error << where idx = 1>>
weird as it's working fine at my end. cheers
User generated image
LOL  :)

Great !!
This looks like too much code and too complicate for me to support in the long run, wondering if there can be something simpler?
Can you provide the code where you want to apply the solution?
Avatar of bfuchs

ASKER

Hi Experts,

Just arrived at work..will test your suggestions & keep you posted.

Thanks,
Ben
Avatar of bfuchs

ASKER

Hi Experts,

One thing I forgot to mention is that we have SQL 2005 in place..

and therefore none of those suggestions works as posted.

Here are the errors I got.

@Daniel,
Msg 139, Level 15, State 1, Line 0
Cannot assign a default value to a local variable.
Msg 137, Level 15, State 2, Line 11
Must declare the scalar variable "@UpperLimit".

@Ryan,
Msg 139, Level 15, State 1, Line 0
Cannot assign a default value to a local variable.
Msg 137, Level 15, State 2, Line 6
Must declare the scalar variable "@str".
Msg 137, Level 15, State 2, Line 22
Must declare the scalar variable "@str".

@Pawan,
Msg 139, Level 15, State 1, Procedure fn_RemoveDuplicate, Line 0
Cannot assign a default value to a local variable.
Msg 137, Level 15, State 1, Procedure fn_RemoveDuplicate, Line 31
Must declare the scalar variable "@Removed_Duplicate".
Msg 137, Level 15, State 2, Procedure fn_RemoveDuplicate, Line 33
Must declare the scalar variable "@Removed_Duplicate".

Thanks,
Ben
Hi Ben,
Ohh 2005, We have rewrite it.

Wait will try this out.

Regards,
Pawan
quick fix and try:
DECLARE @Xml XML
DECLARE @str AS VARCHAR(1000)
DECLARE @Removed_Duplicate VARCHAR(1000)

set @str = 'A very A long stringx, d D testing the string'

SET @Xml = CAST(('<P>'+REPLACE(@str,' ','</P><P>')+'</P>') AS XML)
 
;WITH CTE AS 
(
	SELECT
	ROW_NUMBER() OVER(ORDER BY (SELECT 1)) rnk,
	ROW_NUMBER() OVER(partition by a.value('.', 'VARCHAR(1000)') ORDER BY (SELECT 1)) idx,
	a.value('.', 'VARCHAR(1000)') col
	FROM @Xml.nodes('P') pn(a) 
)

select @Removed_Duplicate = stuff(
(
select ' ' + col from CTE where idx = 1 order by rnk for xml path('')
),1,1,'')

print @str
print @Removed_Duplicate

Open in new window

Hi Ryan,
It is giving incorrect output.

Input - A very A long string. d D testing the string
Output - A d long string string. testing the very

Regards,
Pawan
@Pawan,

it's weird (again?), i got this instead:

A very A long stringx, d D testing the string
A very long stringx, d testing the string
LOL :)
Avatar of bfuchs

ASKER

@Rayn,
I get the word string twice see below

i-A very A long string, d D testing the string
O-A very long string, d testing the string

Thanks,
Ben
I get the word string twice see below

i-A very A long string, d D testing the string
O-A very long string, d testing the string
since your input is "string," (with a comma), it's different from "string"
Avatar of bfuchs

ASKER

Actually I see you right.

Would it be possible for you to convert this to a function (the way Pawan did above), so I dont need to write this whole thing in my stored procedure?

Thanks,
Ben
Hi Author,

Can you get this kind of string - string with a . Like Pawan. and a Pawan

A very A long Pawan. d D testing the Pawan

or

A very A long string,string, d D testing the string

In both the above cases , all above solutions will not work. We have came with something else.
FUNCTION TO REMOVE Duplicate substring from a string...

/* FUNCTION TO REMOVE Duplicate substring from a string */


---

CREATE FUNCTION RemoveDupicateFromString
(
	 @Inputs VARCHAR(1000)
	,@Ignore VARCHAR(100) --Pipe Seperated
)
RETURNS VARCHAR(1000)
AS 
BEGIN

DECLARE @ProductTotals TABLE
(
  ProductID int, 
  Revenue money
)
	DECLARE @V AS VARCHAR(1000)= ''
	DECLARE @IgnoreChrs VARCHAR(MAX) = @Ignore
	DECLARE @finalString AS VARCHAR(1000) = ''
	DECLARE @InputString AS VARCHAR(MAX) = RTRIM(LTRIM(@Inputs))

	DECLARE @Pawan TABLE ( Value VARCHAR(1) )
	DECLARE @TempBills TABLE (strs VARCHAR(1000))

	DECLARE @x1 AS XML=''
	SET @x1 = CAST('<A>'+ REPLACE(@IgnoreChrs,'|','</A><A>')+ '</A>' AS XML)

	INSERT INTO @Pawan
	SELECT t.value('.', 'VARCHAR(1)') Value FROM @x1.nodes('/A')  AS x(t)

	INSERT INTO @TempBills
	SELECT @InputString strs	

	;WITH CTE1 AS 
	(		
		SELECT SplitedValue , ROW_NUMBER() OVER (ORDER BY (SELECT 1)) rnk FROM
		(
			SELECT CAST(('<X>'+replace(@InputString,' ' ,'</X><X>')+'</X>') as xml) xmlcol FROM @TempBills e 
		) s
		OUTER APPLY
		(
			SELECT ProjectData.D.value('.', 'VARCHAR(MAX)') as SplitedValue
			FROM s.xmlcol.nodes('X') as ProjectData(D)
		)r
	)	
	SELECT @finalString = STUFF((SELECT ' ' + SplitedValue
	FROM
	(
		SELECT SplitedValue , rnk FROM 
		(
			SELECT * , ROW_NUMBER() OVER (PARTITION BY col ORDER BY rnk) duprnk FROM
			( 
				SELECT * , RIGHT(SplitedValue,1) r , CASE WHEN Value IS NOT NULL THEN SUBSTRING(SplitedValue,0,LEN(SplitedValue)) ELSE SplitedValue END col FROM CTE1 
				LEFT JOIN @Pawan p ON p.Value = RIGHT(SplitedValue,1)
			)p
		)m WHERE m.duprnk = 1
	)y ORDER BY rnk FOR XML PATH('')),1,1,'')	
	
	RETURN @finalString

END

Open in new window



USAGE  



Run1

DECLARE @str AS VARCHAR(1000) = 'A very A long stringx, d D testing the string'
SELECT @str InputString , dbo.RemoveDupicateFromString(@str,'.|;|,') AfterRemovingDuplicates

Output
-----
InputString                                                               AfterRemovingDuplicates
A very A long stringx, d D testing the string      A very long stringx, d testing the string

Run2

DECLARE @str AS VARCHAR(1000) = 'A very A long string, d D testing the string'
SELECT @str InputString , dbo.RemoveDupicateFromString(@str,'.|;|,') AfterRemovingDuplicates

Output
-----
InputString                                                               AfterRemovingDuplicates
A very A long string, d D testing the string      A very long string, d testing the

Run3

DECLARE @str AS VARCHAR(1000) = 'A very A long string. d D testing the string'
SELECT @str InputString , dbo.RemoveDupicateFromString(@str,'.|;|,') AfterRemovingDuplicates

Output
-----
InputString                                                               AfterRemovingDuplicates
A very A long string. d D testing the string      A very long string. d testing the


Run4

DECLARE @str AS VARCHAR(1000) = 'Pawan Pawan'
SELECT @str InputString , dbo.RemoveDupicateFromString(@str,'.|;|,') AfterRemovingDuplicates

Output
-----
InputString                                                                        AfterRemovingDuplicates
Pawan Pawan                                                            Pawan


Hope it helps !!
One thing I forgot to mention is that we have SQL 2005 in place.
This means no CTE and no ROW_NUMBER
Unfortunally Microsoft never released a SPLIT function until MSSQL 2016 so you'll need to search in the internet for some already made split functions. Here you have some so suit yourself.
Ok, I made a testing using the first function in the article I've linked above:
CREATE FUNCTION dbo.Split
(
	@RowData nvarchar(2000),
	@SplitOn nvarchar(5)
)  
RETURNS @RtnValue table 
(
	Id int identity(1,1),
	Data nvarchar(100)
) 
AS  
BEGIN 
	Declare @Cnt int
	Set @Cnt = 1

	While (Charindex(@SplitOn,@RowData)>0)
	Begin
		Insert Into @RtnValue (data)
		Select 
			Data = ltrim(rtrim(Substring(@RowData,1,Charindex(@SplitOn,@RowData)-1)))

		Set @RowData = Substring(@RowData,Charindex(@SplitOn,@RowData)+1,len(@RowData))
		Set @Cnt = @Cnt + 1
	End
	
	Insert Into @RtnValue (data)
	Select Data = ltrim(rtrim(@RowData))

	Return
END

Open in new window

With that function I tested against your example:
DECLARE @LongString VARCHAR(MAX) = 'A very long string, testing the string'

SELECT Data Word, Min(Id) Pos
INTO #Words
FROM split(@LongString, ' ')
GROUP BY Data

SELECT LTRIM(STUFF((
		SELECT ' ' + Word 
		FROM #Words
		ORDER BY Pos
		FOR XML PATH('')),1,1,''))
	
DROP TABLE #Words

Open in new window

It only works if you don't have the comma next to 'string,'. So you'll need to get rid the comma before calling the split function. Something like this:
DECLARE @LongString VARCHAR(MAX) = 'A very long string, testing the string'

SELECT Data Word, Min(Id) Pos
INTO #Words
FROM split(REPLACE(@LongString,',',''), ' ')
GROUP BY Data

SELECT LTRIM(STUFF((
		SELECT ' ' + Word 
		FROM #Words
		ORDER BY Pos
		FOR XML PATH('')),1,1,''))
	
DROP TABLE #Words

Open in new window

The trick here is to use the REPLACE function to replace all those unwanted characters. This example was for comma, but you can use a similar solution for other characters like ;:-" or whatever you can remember.
Avatar of bfuchs

ASKER

Hi experts,

On the way to the office..will test those & get you posted..

Thanks,
Ben
Avatar of bfuchs

ASKER

@Pawan,

you forgot about the 2005 issue..?

this is what I get when trying to create your function

Msg 139, Level 15, State 1, Procedure RemoveDupicateFromString, Line 0
Cannot assign a default value to a local variable.
Msg 139, Level 15, State 1, Procedure RemoveDupicateFromString, Line 0
Cannot assign a default value to a local variable.
Msg 139, Level 15, State 1, Procedure RemoveDupicateFromString, Line 0
Cannot assign a default value to a local variable.
Msg 139, Level 15, State 1, Procedure RemoveDupicateFromString, Line 0
Cannot assign a default value to a local variable.
Msg 139, Level 15, State 1, Procedure RemoveDupicateFromString, Line 0
Cannot assign a default value to a local variable.
Msg 137, Level 15, State 2, Procedure RemoveDupicateFromString, Line 26
Must declare the scalar variable "@IgnoreChrs".
Msg 137, Level 15, State 2, Procedure RemoveDupicateFromString, Line 29
Must declare the scalar variable "@x1".
Msg 137, Level 15, State 2, Procedure RemoveDupicateFromString, Line 32
Must declare the scalar variable "@InputString".
Msg 137, Level 15, State 2, Procedure RemoveDupicateFromString, Line 38
Must declare the scalar variable "@InputString".
Msg 102, Level 15, State 1, Procedure RemoveDupicateFromString, Line 44
Incorrect syntax near 'r'.
Msg 137, Level 15, State 1, Procedure RemoveDupicateFromString, Line 46
Must declare the scalar variable "@finalString".
Msg 137, Level 15, State 2, Procedure RemoveDupicateFromString, Line 59
Must declare the scalar variable "@finalString".


@Vitor

This means no CTE and no ROW_NUMBER
I do have those functioning in my environment.

Re your solution, would it be possible to have all incorporated into one function (similar to pawan's example), so the only thing I have to do is use one line of code to call the eliminate dup function?

Thanks,
Ben



@Vitor,
ASKER CERTIFIED SOLUTION
Avatar of Pawan Kumar
Pawan Kumar
Flag of India image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of bfuchs

ASKER

Hi Pawan,

but when trying to test it I also need a way in 2005..currently getting below

DECLARE @str AS VARCHAR(1000) = 'A very A long string, d D testing the string'
SELECT @str InputString , dbo.RemoveDupicateFromString(@str,'.|;|,')

Msg 139, Level 15, State 1, Line 0
Cannot assign a default value to a local variable.
Msg 137, Level 15, State 2, Line 2
Must declare the scalar variable "@str".

Thanks,
Ben
Try this Sir,

DECLARE @str AS VARCHAR(1000) 
SET @str = 'A very A long string, d D testing the string'
SELECT @str InputString , dbo.RemoveDupicateFromString(@str,'.|;|,') 

Open in new window

Avatar of bfuchs

ASKER

Pawan you got it!

Thanks to all participants.