[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 151
  • Last Modified:

how to eliminate duplicates in a string variable in t-sql?

Hi Experts,

Lets say I have a variable called MySTR and has the following value 'A very long string, testing the string'
since the word string is duplicate I would like to eliminate it, and just get the following 'A very long string, testing the '
0
bfuchs
Asked:
bfuchs
  • 11
  • 9
  • 6
  • +2
1 Solution
 
Pawan KumarDatabase ExpertCommented:
Try....

DECLARE @str AS VARCHAR(1000) = 'A very long string, testing the string'

DECLARE @Xml XML
DECLARE @Removed_Duplicate VARCHAR(1000)

SET @Xml = CAST(('<P>'+REPLACE(@str,' ','</P><P>')+'</P>') AS XML)
 
;WITH CTE AS 
(
	SELECT
	ROW_NUMBER() OVER(ORDER BY (SELECT 1)) rnk,
	a.value('.', 'VARCHAR(1000)') col
	FROM @Xml.nodes('P') pn(a) 
)

SELECT @Removed_Duplicate =
(
	SELECT STUFF((SELECT '' + ' ' + '' + c.col FROM CTE c
	LEFT JOIN CTE c1 
		ON c.rnk+1=c1.rnk
	WHERE (c.col<>c1.col OR c.col is NULL)
	FOR XML PATH('') ),1,1,'')
)
 
SELECT @Removed_Duplicate RemovedDuplicateWords

Open in new window


Ouptut

RemovedDuplicateWords
A very long string, testing the
0
 
Ryan ChongCommented:
@Pawan,

your solution seems got issue if we have this?

DECLARE @str AS VARCHAR(1000) = 'A very long stringx, testing the string'

Open in new window


@bfuchs,
do you want to remove a repeated word or is that possible to have multiple words that repeated, like this? do you want to check the upper/lowercase?
A very A long stringx, d D testing the string

Open in new window

0
 
bfuchsAuthor Commented:
@Pawan,,

This looks like too much code and too complicate for me to support in the long run, wondering if there can be something simpler?

@Ryan,
I want to assign the non duplicate words to another variable, case does not matter.

ps. nothing should be repeated, the second A should also be removed.

Thanks,
Ben
0
What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

 
Pawan KumarDatabase ExpertCommented:
Hi,

Will change to some simple logic.

@Ryan - Thanks Bro.
0
 
Daniel JonesData Research AnalystCommented:
0
 
Ryan ChongCommented:
I like Pawan's suggestion, and it can be written like this:
DECLARE @str AS VARCHAR(1000) = 'A very A long stringx, d D testing the string'

DECLARE @Xml XML
DECLARE @Removed_Duplicate VARCHAR(1000)

SET @Xml = CAST(('<P>'+REPLACE(@str,' ','</P><P>')+'</P>') AS XML)
 
;WITH CTE AS 
(
	SELECT
	ROW_NUMBER() OVER(ORDER BY (SELECT 1)) rnk,
	ROW_NUMBER() OVER(partition by a.value('.', 'VARCHAR(1000)') ORDER BY (SELECT 1)) idx,
	a.value('.', 'VARCHAR(1000)') col
	FROM @Xml.nodes('P') pn(a) 
)

select @Removed_Duplicate = stuff(
(
select ' ' + col from CTE where idx = 1 order by rnk for xml path('')
),1,1,'')

print @str
print @Removed_Duplicate

Open in new window


the SplitString and ReassembleString functions as mentioned in Daniel's link should have worked too which has similar concept
0
 
Pawan KumarDatabase ExpertCommented:
Hi Ryan - the code you posted is giving error << where idx = 1>>

@Author - This is the best way to handle this.. We can create a function to manage the code going forward..

Here is the complete code..

--

DECLARE @str AS VARCHAR(1000) = 'A very long stringx, testing the string'

DECLARE @Xml XML
DECLARE @Removed_Duplicate VARCHAR(1000)=''

SET @Xml = CAST(('<P>'+REPLACE(@str,' ','</P><P>')+'</P>') AS XML)
 
;WITH CTE AS 
(
	SELECT
	ROW_NUMBER() OVER(ORDER BY (SELECT 1)) rnk,
	a.value('.', 'VARCHAR(1000)') col
	FROM @Xml.nodes('P') pn(a) 
)
,CTE1 AS
(
	SELECT * FROM
	(
		SELECT rnk, col , ROW_NUMBER() OVER (PARTITION BY CASE WHEN CHARINDEX(',',col,1) > 0 THEN REPLACE(col,',','') ELSE col END ORDER BY rnk) cols FROM CTE
	)k WHERE cols = 1
)
select @Removed_Duplicate = STUFF((SELECT ' ' + col from CTE1 ORDER BY rnk FOR XML PATH('')),1,1,'')

SELECT @str OriginalString ,  @Removed_Duplicate RemovedDuplicateString

--

Open in new window


Run - 1

OriginalString                                              RemovedDuplicateString
A very long stringx, testing the string      A very long stringx, testing the string

Run - 2

OriginalString                                               RemovedDuplicateString
A very long string, testing the string       A very long string, testing the


Function

CREATE FUNCTION fn_RemoveDuplicate
(
	@str VARCHAR(1000)
)
RETURNS VARCHAR(1000)

AS 

BEGIN

	DECLARE @Xml XML
	DECLARE @Removed_Duplicate VARCHAR(1000)=''

	SET @Xml = CAST(('<P>'+REPLACE(@str,' ','</P><P>')+'</P>') AS XML)
 
	;WITH CTE AS 
	(
		SELECT
		ROW_NUMBER() OVER(ORDER BY (SELECT 1)) rnk,
		a.value('.', 'VARCHAR(1000)') col
		FROM @Xml.nodes('P') pn(a) 
	)
	,CTE1 AS
	(
		SELECT * FROM
		(
			SELECT rnk, col , ROW_NUMBER() OVER (PARTITION BY CASE WHEN CHARINDEX(',',col,1) > 0 THEN REPLACE(col,',','') ELSE col END ORDER BY rnk) cols FROM CTE
		)k WHERE cols = 1
	)
	select @Removed_Duplicate = STUFF((SELECT ' ' + col from CTE1 ORDER BY rnk FOR XML PATH('')),1,1,'')

	RETURN @Removed_Duplicate

END

Open in new window


Usage -

SELECT dbo.fn_RemoveDuplicate('A very long stringx, testing the string')

Open in new window


Output
--------
A very long stringx, testing the string


Hope it helps !!
0
 
Ryan ChongCommented:
@Pawan,
the code you posted is giving error << where idx = 1>>
weird as it's working fine at my end. cheers
SnapShot.png
0
 
Pawan KumarDatabase ExpertCommented:
LOL  :)

Great !!
0
 
Vitor MontalvãoMSSQL Senior EngineerCommented:
This looks like too much code and too complicate for me to support in the long run, wondering if there can be something simpler?
Can you provide the code where you want to apply the solution?
0
 
bfuchsAuthor Commented:
Hi Experts,

Just arrived at work..will test your suggestions & keep you posted.

Thanks,
Ben
0
 
bfuchsAuthor Commented:
Hi Experts,

One thing I forgot to mention is that we have SQL 2005 in place..

and therefore none of those suggestions works as posted.

Here are the errors I got.

@Daniel,
Msg 139, Level 15, State 1, Line 0
Cannot assign a default value to a local variable.
Msg 137, Level 15, State 2, Line 11
Must declare the scalar variable "@UpperLimit".

@Ryan,
Msg 139, Level 15, State 1, Line 0
Cannot assign a default value to a local variable.
Msg 137, Level 15, State 2, Line 6
Must declare the scalar variable "@str".
Msg 137, Level 15, State 2, Line 22
Must declare the scalar variable "@str".

@Pawan,
Msg 139, Level 15, State 1, Procedure fn_RemoveDuplicate, Line 0
Cannot assign a default value to a local variable.
Msg 137, Level 15, State 1, Procedure fn_RemoveDuplicate, Line 31
Must declare the scalar variable "@Removed_Duplicate".
Msg 137, Level 15, State 2, Procedure fn_RemoveDuplicate, Line 33
Must declare the scalar variable "@Removed_Duplicate".

Thanks,
Ben
0
 
Pawan KumarDatabase ExpertCommented:
Hi Ben,
Ohh 2005, We have rewrite it.

Wait will try this out.

Regards,
Pawan
0
 
Ryan ChongCommented:
quick fix and try:
DECLARE @Xml XML
DECLARE @str AS VARCHAR(1000)
DECLARE @Removed_Duplicate VARCHAR(1000)

set @str = 'A very A long stringx, d D testing the string'

SET @Xml = CAST(('<P>'+REPLACE(@str,' ','</P><P>')+'</P>') AS XML)
 
;WITH CTE AS 
(
	SELECT
	ROW_NUMBER() OVER(ORDER BY (SELECT 1)) rnk,
	ROW_NUMBER() OVER(partition by a.value('.', 'VARCHAR(1000)') ORDER BY (SELECT 1)) idx,
	a.value('.', 'VARCHAR(1000)') col
	FROM @Xml.nodes('P') pn(a) 
)

select @Removed_Duplicate = stuff(
(
select ' ' + col from CTE where idx = 1 order by rnk for xml path('')
),1,1,'')

print @str
print @Removed_Duplicate

Open in new window

0
 
Pawan KumarDatabase ExpertCommented:
Hi Ryan,
It is giving incorrect output.

Input - A very A long string. d D testing the string
Output - A d long string string. testing the very

Regards,
Pawan
0
 
Ryan ChongCommented:
@Pawan,

it's weird (again?), i got this instead:

A very A long stringx, d D testing the string
A very long stringx, d testing the string
0
 
Pawan KumarDatabase ExpertCommented:
LOL :)
0
 
bfuchsAuthor Commented:
@Rayn,
I get the word string twice see below

i-A very A long string, d D testing the string
O-A very long string, d testing the string

Thanks,
Ben
0
 
Ryan ChongCommented:
I get the word string twice see below

i-A very A long string, d D testing the string
O-A very long string, d testing the string
since your input is "string," (with a comma), it's different from "string"
0
 
bfuchsAuthor Commented:
Actually I see you right.

Would it be possible for you to convert this to a function (the way Pawan did above), so I dont need to write this whole thing in my stored procedure?

Thanks,
Ben
0
 
Pawan KumarDatabase ExpertCommented:
Hi Author,

Can you get this kind of string - string with a . Like Pawan. and a Pawan

A very A long Pawan. d D testing the Pawan

or

A very A long string,string, d D testing the string

In both the above cases , all above solutions will not work. We have came with something else.
0
 
Pawan KumarDatabase ExpertCommented:
FUNCTION TO REMOVE Duplicate substring from a string...

/* FUNCTION TO REMOVE Duplicate substring from a string */


---

CREATE FUNCTION RemoveDupicateFromString
(
	 @Inputs VARCHAR(1000)
	,@Ignore VARCHAR(100) --Pipe Seperated
)
RETURNS VARCHAR(1000)
AS 
BEGIN

DECLARE @ProductTotals TABLE
(
  ProductID int, 
  Revenue money
)
	DECLARE @V AS VARCHAR(1000)= ''
	DECLARE @IgnoreChrs VARCHAR(MAX) = @Ignore
	DECLARE @finalString AS VARCHAR(1000) = ''
	DECLARE @InputString AS VARCHAR(MAX) = RTRIM(LTRIM(@Inputs))

	DECLARE @Pawan TABLE ( Value VARCHAR(1) )
	DECLARE @TempBills TABLE (strs VARCHAR(1000))

	DECLARE @x1 AS XML=''
	SET @x1 = CAST('<A>'+ REPLACE(@IgnoreChrs,'|','</A><A>')+ '</A>' AS XML)

	INSERT INTO @Pawan
	SELECT t.value('.', 'VARCHAR(1)') Value FROM @x1.nodes('/A')  AS x(t)

	INSERT INTO @TempBills
	SELECT @InputString strs	

	;WITH CTE1 AS 
	(		
		SELECT SplitedValue , ROW_NUMBER() OVER (ORDER BY (SELECT 1)) rnk FROM
		(
			SELECT CAST(('<X>'+replace(@InputString,' ' ,'</X><X>')+'</X>') as xml) xmlcol FROM @TempBills e 
		) s
		OUTER APPLY
		(
			SELECT ProjectData.D.value('.', 'VARCHAR(MAX)') as SplitedValue
			FROM s.xmlcol.nodes('X') as ProjectData(D)
		)r
	)	
	SELECT @finalString = STUFF((SELECT ' ' + SplitedValue
	FROM
	(
		SELECT SplitedValue , rnk FROM 
		(
			SELECT * , ROW_NUMBER() OVER (PARTITION BY col ORDER BY rnk) duprnk FROM
			( 
				SELECT * , RIGHT(SplitedValue,1) r , CASE WHEN Value IS NOT NULL THEN SUBSTRING(SplitedValue,0,LEN(SplitedValue)) ELSE SplitedValue END col FROM CTE1 
				LEFT JOIN @Pawan p ON p.Value = RIGHT(SplitedValue,1)
			)p
		)m WHERE m.duprnk = 1
	)y ORDER BY rnk FOR XML PATH('')),1,1,'')	
	
	RETURN @finalString

END

Open in new window



USAGE  



Run1

DECLARE @str AS VARCHAR(1000) = 'A very A long stringx, d D testing the string'
SELECT @str InputString , dbo.RemoveDupicateFromString(@str,'.|;|,') AfterRemovingDuplicates

Output
-----
InputString                                                               AfterRemovingDuplicates
A very A long stringx, d D testing the string      A very long stringx, d testing the string

Run2

DECLARE @str AS VARCHAR(1000) = 'A very A long string, d D testing the string'
SELECT @str InputString , dbo.RemoveDupicateFromString(@str,'.|;|,') AfterRemovingDuplicates

Output
-----
InputString                                                               AfterRemovingDuplicates
A very A long string, d D testing the string      A very long string, d testing the

Run3

DECLARE @str AS VARCHAR(1000) = 'A very A long string. d D testing the string'
SELECT @str InputString , dbo.RemoveDupicateFromString(@str,'.|;|,') AfterRemovingDuplicates

Output
-----
InputString                                                               AfterRemovingDuplicates
A very A long string. d D testing the string      A very long string. d testing the


Run4

DECLARE @str AS VARCHAR(1000) = 'Pawan Pawan'
SELECT @str InputString , dbo.RemoveDupicateFromString(@str,'.|;|,') AfterRemovingDuplicates

Output
-----
InputString                                                                        AfterRemovingDuplicates
Pawan Pawan                                                            Pawan


Hope it helps !!
0
 
Vitor MontalvãoMSSQL Senior EngineerCommented:
One thing I forgot to mention is that we have SQL 2005 in place.
This means no CTE and no ROW_NUMBER
Unfortunally Microsoft never released a SPLIT function until MSSQL 2016 so you'll need to search in the internet for some already made split functions. Here you have some so suit yourself.
0
 
Vitor MontalvãoMSSQL Senior EngineerCommented:
Ok, I made a testing using the first function in the article I've linked above:
CREATE FUNCTION dbo.Split
(
	@RowData nvarchar(2000),
	@SplitOn nvarchar(5)
)  
RETURNS @RtnValue table 
(
	Id int identity(1,1),
	Data nvarchar(100)
) 
AS  
BEGIN 
	Declare @Cnt int
	Set @Cnt = 1

	While (Charindex(@SplitOn,@RowData)>0)
	Begin
		Insert Into @RtnValue (data)
		Select 
			Data = ltrim(rtrim(Substring(@RowData,1,Charindex(@SplitOn,@RowData)-1)))

		Set @RowData = Substring(@RowData,Charindex(@SplitOn,@RowData)+1,len(@RowData))
		Set @Cnt = @Cnt + 1
	End
	
	Insert Into @RtnValue (data)
	Select Data = ltrim(rtrim(@RowData))

	Return
END

Open in new window

With that function I tested against your example:
DECLARE @LongString VARCHAR(MAX) = 'A very long string, testing the string'

SELECT Data Word, Min(Id) Pos
INTO #Words
FROM split(@LongString, ' ')
GROUP BY Data

SELECT LTRIM(STUFF((
		SELECT ' ' + Word 
		FROM #Words
		ORDER BY Pos
		FOR XML PATH('')),1,1,''))
	
DROP TABLE #Words

Open in new window

It only works if you don't have the comma next to 'string,'. So you'll need to get rid the comma before calling the split function. Something like this:
DECLARE @LongString VARCHAR(MAX) = 'A very long string, testing the string'

SELECT Data Word, Min(Id) Pos
INTO #Words
FROM split(REPLACE(@LongString,',',''), ' ')
GROUP BY Data

SELECT LTRIM(STUFF((
		SELECT ' ' + Word 
		FROM #Words
		ORDER BY Pos
		FOR XML PATH('')),1,1,''))
	
DROP TABLE #Words

Open in new window

The trick here is to use the REPLACE function to replace all those unwanted characters. This example was for comma, but you can use a similar solution for other characters like ;:-" or whatever you can remember.
0
 
bfuchsAuthor Commented:
Hi experts,

On the way to the office..will test those & get you posted..

Thanks,
Ben
0
 
bfuchsAuthor Commented:
@Pawan,

you forgot about the 2005 issue..?

this is what I get when trying to create your function

Msg 139, Level 15, State 1, Procedure RemoveDupicateFromString, Line 0
Cannot assign a default value to a local variable.
Msg 139, Level 15, State 1, Procedure RemoveDupicateFromString, Line 0
Cannot assign a default value to a local variable.
Msg 139, Level 15, State 1, Procedure RemoveDupicateFromString, Line 0
Cannot assign a default value to a local variable.
Msg 139, Level 15, State 1, Procedure RemoveDupicateFromString, Line 0
Cannot assign a default value to a local variable.
Msg 139, Level 15, State 1, Procedure RemoveDupicateFromString, Line 0
Cannot assign a default value to a local variable.
Msg 137, Level 15, State 2, Procedure RemoveDupicateFromString, Line 26
Must declare the scalar variable "@IgnoreChrs".
Msg 137, Level 15, State 2, Procedure RemoveDupicateFromString, Line 29
Must declare the scalar variable "@x1".
Msg 137, Level 15, State 2, Procedure RemoveDupicateFromString, Line 32
Must declare the scalar variable "@InputString".
Msg 137, Level 15, State 2, Procedure RemoveDupicateFromString, Line 38
Must declare the scalar variable "@InputString".
Msg 102, Level 15, State 1, Procedure RemoveDupicateFromString, Line 44
Incorrect syntax near 'r'.
Msg 137, Level 15, State 1, Procedure RemoveDupicateFromString, Line 46
Must declare the scalar variable "@finalString".
Msg 137, Level 15, State 2, Procedure RemoveDupicateFromString, Line 59
Must declare the scalar variable "@finalString".


@Vitor

This means no CTE and no ROW_NUMBER
I do have those functioning in my environment.

Re your solution, would it be possible to have all incorporated into one function (similar to pawan's example), so the only thing I have to do is use one line of code to call the eliminate dup function?

Thanks,
Ben



@Vitor,
0
 
Pawan KumarDatabase ExpertCommented:
Hi Ben,

Ohh yeah I fixed it. Could you please check below. I am online for now time now, so if you face any issue, I shall fix it right away.

---

CREATE FUNCTION RemoveDupicateFromString
(
	 @Inputs VARCHAR(1000)
	,@Ignore VARCHAR(100) --Pipe Seperated
)
RETURNS VARCHAR(1000)
AS 
BEGIN

DECLARE @ProductTotals TABLE
(
  ProductID int, 
  Revenue money
)
	DECLARE @V AS VARCHAR(1000)
	SET @V = ''
	DECLARE @IgnoreChrs VARCHAR(MAX) 
	
	SET @IgnoreChrs = @Ignore
	DECLARE @finalString AS VARCHAR(1000) 
	
	SET @finalString = ''
	DECLARE @InputString AS VARCHAR(MAX) 
	
	SET @InputString = RTRIM(LTRIM(@Inputs))

	DECLARE @Pawan TABLE ( Value VARCHAR(1) )
	DECLARE @TempBills TABLE (strs VARCHAR(1000))

	DECLARE @x1 AS XML
	SET @x1 = CAST('<A>'+ REPLACE(@IgnoreChrs,'|','</A><A>')+ '</A>' AS XML)

	INSERT INTO @Pawan
	SELECT t.value('.', 'VARCHAR(1)') Value FROM @x1.nodes('/A')  AS x(t)

	INSERT INTO @TempBills
	SELECT @InputString strs	

	;WITH CTE1 AS 
	(		
		SELECT SplitedValue , ROW_NUMBER() OVER (ORDER BY (SELECT 1)) rnk FROM
		(
			SELECT CAST(('<X>'+replace(@InputString,' ' ,'</X><X>')+'</X>') as xml) xmlcol FROM @TempBills e 
		) s
		OUTER APPLY
		(
			SELECT ProjectData.D.value('.', 'VARCHAR(MAX)') as SplitedValue
			FROM s.xmlcol.nodes('X') as ProjectData(D)
		)r
	)	
	SELECT @finalString = STUFF((SELECT ' ' + SplitedValue
	FROM
	(
		SELECT SplitedValue , rnk FROM 
		(
			SELECT * , ROW_NUMBER() OVER (PARTITION BY col ORDER BY rnk) duprnk FROM
			( 
				SELECT * , RIGHT(SplitedValue,1) r , CASE WHEN Value IS NOT NULL THEN SUBSTRING(SplitedValue,0,LEN(SplitedValue)) ELSE SplitedValue END col FROM CTE1 
				LEFT JOIN @Pawan p ON p.Value = RIGHT(SplitedValue,1)
			)p
		)m WHERE m.duprnk = 1
	)y ORDER BY rnk FOR XML PATH('')),1,1,'')	
	
	RETURN @finalString

END

Open in new window



Usage

DECLARE @str AS VARCHAR(1000) = 'A very A long string, d D testing the string'
SELECT @str InputString , dbo.RemoveDupicateFromString(@str,'.|;|,') AfterRemovingDuplicates


Hope it helps!!
0
 
bfuchsAuthor Commented:
Hi Pawan,

but when trying to test it I also need a way in 2005..currently getting below

DECLARE @str AS VARCHAR(1000) = 'A very A long string, d D testing the string'
SELECT @str InputString , dbo.RemoveDupicateFromString(@str,'.|;|,')

Msg 139, Level 15, State 1, Line 0
Cannot assign a default value to a local variable.
Msg 137, Level 15, State 2, Line 2
Must declare the scalar variable "@str".

Thanks,
Ben
0
 
Pawan KumarDatabase ExpertCommented:
Try this Sir,

DECLARE @str AS VARCHAR(1000) 
SET @str = 'A very A long string, d D testing the string'
SELECT @str InputString , dbo.RemoveDupicateFromString(@str,'.|;|,') 

Open in new window

0
 
bfuchsAuthor Commented:
Pawan you got it!

Thanks to all participants.
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 11
  • 9
  • 6
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now