bfuchs
asked on
how to eliminate duplicates in a string variable in t-sql?
Hi Experts,
Lets say I have a variable called MySTR and has the following value 'A very long string, testing the string'
since the word string is duplicate I would like to eliminate it, and just get the following 'A very long string, testing the '
Lets say I have a variable called MySTR and has the following value 'A very long string, testing the string'
since the word string is duplicate I would like to eliminate it, and just get the following 'A very long string, testing the '
@Pawan,
your solution seems got issue if we have this?
@bfuchs,
do you want to remove a repeated word or is that possible to have multiple words that repeated, like this? do you want to check the upper/lowercase?
your solution seems got issue if we have this?
DECLARE @str AS VARCHAR(1000) = 'A very long stringx, testing the string'
@bfuchs,
do you want to remove a repeated word or is that possible to have multiple words that repeated, like this? do you want to check the upper/lowercase?
A very A long stringx, d D testing the string
ASKER
@Pawan,,
This looks like too much code and too complicate for me to support in the long run, wondering if there can be something simpler?
@Ryan,
I want to assign the non duplicate words to another variable, case does not matter.
ps. nothing should be repeated, the second A should also be removed.
Thanks,
Ben
This looks like too much code and too complicate for me to support in the long run, wondering if there can be something simpler?
@Ryan,
I want to assign the non duplicate words to another variable, case does not matter.
ps. nothing should be repeated, the second A should also be removed.
Thanks,
Ben
Hi,
Will change to some simple logic.
@Ryan - Thanks Bro.
Will change to some simple logic.
@Ryan - Thanks Bro.
Aaron Bertrand has briefly explained this here: https://www.mssqltips.com/ sqlservert ip/4140/re moving-dup licates-fr om-strings -in-sql-se rver/
I like Pawan's suggestion, and it can be written like this:
the SplitString and ReassembleString functions as mentioned in Daniel's link should have worked too which has similar concept
DECLARE @str AS VARCHAR(1000) = 'A very A long stringx, d D testing the string'
DECLARE @Xml XML
DECLARE @Removed_Duplicate VARCHAR(1000)
SET @Xml = CAST(('<P>'+REPLACE(@str,' ','</P><P>')+'</P>') AS XML)
;WITH CTE AS
(
SELECT
ROW_NUMBER() OVER(ORDER BY (SELECT 1)) rnk,
ROW_NUMBER() OVER(partition by a.value('.', 'VARCHAR(1000)') ORDER BY (SELECT 1)) idx,
a.value('.', 'VARCHAR(1000)') col
FROM @Xml.nodes('P') pn(a)
)
select @Removed_Duplicate = stuff(
(
select ' ' + col from CTE where idx = 1 order by rnk for xml path('')
),1,1,'')
print @str
print @Removed_Duplicate
the SplitString and ReassembleString functions as mentioned in Daniel's link should have worked too which has similar concept
Hi Ryan - the code you posted is giving error << where idx = 1>>
@Author - This is the best way to handle this.. We can create a function to manage the code going forward..
Here is the complete code..
Run - 1
OriginalString RemovedDuplicateString
A very long stringx, testing the string A very long stringx, testing the string
Run - 2
OriginalString RemovedDuplicateString
A very long string, testing the string A very long string, testing the
Function
Usage -
Output
--------
A very long stringx, testing the string
Hope it helps !!
@Author - This is the best way to handle this.. We can create a function to manage the code going forward..
Here is the complete code..
--
DECLARE @str AS VARCHAR(1000) = 'A very long stringx, testing the string'
DECLARE @Xml XML
DECLARE @Removed_Duplicate VARCHAR(1000)=''
SET @Xml = CAST(('<P>'+REPLACE(@str,' ','</P><P>')+'</P>') AS XML)
;WITH CTE AS
(
SELECT
ROW_NUMBER() OVER(ORDER BY (SELECT 1)) rnk,
a.value('.', 'VARCHAR(1000)') col
FROM @Xml.nodes('P') pn(a)
)
,CTE1 AS
(
SELECT * FROM
(
SELECT rnk, col , ROW_NUMBER() OVER (PARTITION BY CASE WHEN CHARINDEX(',',col,1) > 0 THEN REPLACE(col,',','') ELSE col END ORDER BY rnk) cols FROM CTE
)k WHERE cols = 1
)
select @Removed_Duplicate = STUFF((SELECT ' ' + col from CTE1 ORDER BY rnk FOR XML PATH('')),1,1,'')
SELECT @str OriginalString , @Removed_Duplicate RemovedDuplicateString
--
Run - 1
OriginalString RemovedDuplicateString
A very long stringx, testing the string A very long stringx, testing the string
Run - 2
OriginalString RemovedDuplicateString
A very long string, testing the string A very long string, testing the
Function
CREATE FUNCTION fn_RemoveDuplicate
(
@str VARCHAR(1000)
)
RETURNS VARCHAR(1000)
AS
BEGIN
DECLARE @Xml XML
DECLARE @Removed_Duplicate VARCHAR(1000)=''
SET @Xml = CAST(('<P>'+REPLACE(@str,' ','</P><P>')+'</P>') AS XML)
;WITH CTE AS
(
SELECT
ROW_NUMBER() OVER(ORDER BY (SELECT 1)) rnk,
a.value('.', 'VARCHAR(1000)') col
FROM @Xml.nodes('P') pn(a)
)
,CTE1 AS
(
SELECT * FROM
(
SELECT rnk, col , ROW_NUMBER() OVER (PARTITION BY CASE WHEN CHARINDEX(',',col,1) > 0 THEN REPLACE(col,',','') ELSE col END ORDER BY rnk) cols FROM CTE
)k WHERE cols = 1
)
select @Removed_Duplicate = STUFF((SELECT ' ' + col from CTE1 ORDER BY rnk FOR XML PATH('')),1,1,'')
RETURN @Removed_Duplicate
END
Usage -
SELECT dbo.fn_RemoveDuplicate('A very long stringx, testing the string')
Output
--------
A very long stringx, testing the string
Hope it helps !!
LOL :)
Great !!
Great !!
This looks like too much code and too complicate for me to support in the long run, wondering if there can be something simpler?Can you provide the code where you want to apply the solution?
ASKER
Hi Experts,
Just arrived at work..will test your suggestions & keep you posted.
Thanks,
Ben
Just arrived at work..will test your suggestions & keep you posted.
Thanks,
Ben
ASKER
Hi Experts,
One thing I forgot to mention is that we have SQL 2005 in place..
and therefore none of those suggestions works as posted.
Here are the errors I got.
@Daniel,
Msg 139, Level 15, State 1, Line 0
Cannot assign a default value to a local variable.
Msg 137, Level 15, State 2, Line 11
Must declare the scalar variable "@UpperLimit".
@Ryan,
Msg 139, Level 15, State 1, Line 0
Cannot assign a default value to a local variable.
Msg 137, Level 15, State 2, Line 6
Must declare the scalar variable "@str".
Msg 137, Level 15, State 2, Line 22
Must declare the scalar variable "@str".
@Pawan,
Msg 139, Level 15, State 1, Procedure fn_RemoveDuplicate, Line 0
Cannot assign a default value to a local variable.
Msg 137, Level 15, State 1, Procedure fn_RemoveDuplicate, Line 31
Must declare the scalar variable "@Removed_Duplicate".
Msg 137, Level 15, State 2, Procedure fn_RemoveDuplicate, Line 33
Must declare the scalar variable "@Removed_Duplicate".
Thanks,
Ben
One thing I forgot to mention is that we have SQL 2005 in place..
and therefore none of those suggestions works as posted.
Here are the errors I got.
@Daniel,
Msg 139, Level 15, State 1, Line 0
Cannot assign a default value to a local variable.
Msg 137, Level 15, State 2, Line 11
Must declare the scalar variable "@UpperLimit".
@Ryan,
Msg 139, Level 15, State 1, Line 0
Cannot assign a default value to a local variable.
Msg 137, Level 15, State 2, Line 6
Must declare the scalar variable "@str".
Msg 137, Level 15, State 2, Line 22
Must declare the scalar variable "@str".
@Pawan,
Msg 139, Level 15, State 1, Procedure fn_RemoveDuplicate, Line 0
Cannot assign a default value to a local variable.
Msg 137, Level 15, State 1, Procedure fn_RemoveDuplicate, Line 31
Must declare the scalar variable "@Removed_Duplicate".
Msg 137, Level 15, State 2, Procedure fn_RemoveDuplicate, Line 33
Must declare the scalar variable "@Removed_Duplicate".
Thanks,
Ben
Hi Ben,
Ohh 2005, We have rewrite it.
Wait will try this out.
Regards,
Pawan
Ohh 2005, We have rewrite it.
Wait will try this out.
Regards,
Pawan
quick fix and try:
DECLARE @Xml XML
DECLARE @str AS VARCHAR(1000)
DECLARE @Removed_Duplicate VARCHAR(1000)
set @str = 'A very A long stringx, d D testing the string'
SET @Xml = CAST(('<P>'+REPLACE(@str,' ','</P><P>')+'</P>') AS XML)
;WITH CTE AS
(
SELECT
ROW_NUMBER() OVER(ORDER BY (SELECT 1)) rnk,
ROW_NUMBER() OVER(partition by a.value('.', 'VARCHAR(1000)') ORDER BY (SELECT 1)) idx,
a.value('.', 'VARCHAR(1000)') col
FROM @Xml.nodes('P') pn(a)
)
select @Removed_Duplicate = stuff(
(
select ' ' + col from CTE where idx = 1 order by rnk for xml path('')
),1,1,'')
print @str
print @Removed_Duplicate
Hi Ryan,
It is giving incorrect output.
Input - A very A long string. d D testing the string
Output - A d long string string. testing the very
Regards,
Pawan
It is giving incorrect output.
Input - A very A long string. d D testing the string
Output - A d long string string. testing the very
Regards,
Pawan
@Pawan,
it's weird (again?), i got this instead:
A very A long stringx, d D testing the string
A very long stringx, d testing the string
it's weird (again?), i got this instead:
A very A long stringx, d D testing the string
A very long stringx, d testing the string
LOL :)
ASKER
@Rayn,
I get the word string twice see below
i-A very A long string, d D testing the string
O-A very long string, d testing the string
Thanks,
Ben
I get the word string twice see below
i-A very A long string, d D testing the string
O-A very long string, d testing the string
Thanks,
Ben
I get the word string twice see belowsince your input is "string," (with a comma), it's different from "string"
i-A very A long string, d D testing the string
O-A very long string, d testing the string
ASKER
Actually I see you right.
Would it be possible for you to convert this to a function (the way Pawan did above), so I dont need to write this whole thing in my stored procedure?
Thanks,
Ben
Would it be possible for you to convert this to a function (the way Pawan did above), so I dont need to write this whole thing in my stored procedure?
Thanks,
Ben
Hi Author,
Can you get this kind of string - string with a . Like Pawan. and a Pawan
A very A long Pawan. d D testing the Pawan
or
A very A long string,string, d D testing the string
In both the above cases , all above solutions will not work. We have came with something else.
Can you get this kind of string - string with a . Like Pawan. and a Pawan
A very A long Pawan. d D testing the Pawan
or
A very A long string,string, d D testing the string
In both the above cases , all above solutions will not work. We have came with something else.
FUNCTION TO REMOVE Duplicate substring from a string...
/* FUNCTION TO REMOVE Duplicate substring from a string */
USAGE
Run1
DECLARE @str AS VARCHAR(1000) = 'A very A long stringx, d D testing the string'
SELECT @str InputString , dbo.RemoveDupicateFromStri ng(@str,'. |;|,') AfterRemovingDuplicates
Output
-----
InputString AfterRemovingDuplicates
A very A long stringx, d D testing the string A very long stringx, d testing the string
Run2
DECLARE @str AS VARCHAR(1000) = 'A very A long string, d D testing the string'
SELECT @str InputString , dbo.RemoveDupicateFromStri ng(@str,'. |;|,') AfterRemovingDuplicates
Output
-----
InputString AfterRemovingDuplicates
A very A long string, d D testing the string A very long string, d testing the
Run3
DECLARE @str AS VARCHAR(1000) = 'A very A long string. d D testing the string'
SELECT @str InputString , dbo.RemoveDupicateFromStri ng(@str,'. |;|,') AfterRemovingDuplicates
Output
-----
InputString AfterRemovingDuplicates
A very A long string. d D testing the string A very long string. d testing the
Run4
DECLARE @str AS VARCHAR(1000) = 'Pawan Pawan'
SELECT @str InputString , dbo.RemoveDupicateFromStri ng(@str,'. |;|,') AfterRemovingDuplicates
Output
-----
InputString AfterRemovingDuplicates
Pawan Pawan Pawan
Hope it helps !!
/* FUNCTION TO REMOVE Duplicate substring from a string */
---
CREATE FUNCTION RemoveDupicateFromString
(
@Inputs VARCHAR(1000)
,@Ignore VARCHAR(100) --Pipe Seperated
)
RETURNS VARCHAR(1000)
AS
BEGIN
DECLARE @ProductTotals TABLE
(
ProductID int,
Revenue money
)
DECLARE @V AS VARCHAR(1000)= ''
DECLARE @IgnoreChrs VARCHAR(MAX) = @Ignore
DECLARE @finalString AS VARCHAR(1000) = ''
DECLARE @InputString AS VARCHAR(MAX) = RTRIM(LTRIM(@Inputs))
DECLARE @Pawan TABLE ( Value VARCHAR(1) )
DECLARE @TempBills TABLE (strs VARCHAR(1000))
DECLARE @x1 AS XML=''
SET @x1 = CAST('<A>'+ REPLACE(@IgnoreChrs,'|','</A><A>')+ '</A>' AS XML)
INSERT INTO @Pawan
SELECT t.value('.', 'VARCHAR(1)') Value FROM @x1.nodes('/A') AS x(t)
INSERT INTO @TempBills
SELECT @InputString strs
;WITH CTE1 AS
(
SELECT SplitedValue , ROW_NUMBER() OVER (ORDER BY (SELECT 1)) rnk FROM
(
SELECT CAST(('<X>'+replace(@InputString,' ' ,'</X><X>')+'</X>') as xml) xmlcol FROM @TempBills e
) s
OUTER APPLY
(
SELECT ProjectData.D.value('.', 'VARCHAR(MAX)') as SplitedValue
FROM s.xmlcol.nodes('X') as ProjectData(D)
)r
)
SELECT @finalString = STUFF((SELECT ' ' + SplitedValue
FROM
(
SELECT SplitedValue , rnk FROM
(
SELECT * , ROW_NUMBER() OVER (PARTITION BY col ORDER BY rnk) duprnk FROM
(
SELECT * , RIGHT(SplitedValue,1) r , CASE WHEN Value IS NOT NULL THEN SUBSTRING(SplitedValue,0,LEN(SplitedValue)) ELSE SplitedValue END col FROM CTE1
LEFT JOIN @Pawan p ON p.Value = RIGHT(SplitedValue,1)
)p
)m WHERE m.duprnk = 1
)y ORDER BY rnk FOR XML PATH('')),1,1,'')
RETURN @finalString
END
USAGE
Run1
DECLARE @str AS VARCHAR(1000) = 'A very A long stringx, d D testing the string'
SELECT @str InputString , dbo.RemoveDupicateFromStri
Output
-----
InputString AfterRemovingDuplicates
A very A long stringx, d D testing the string A very long stringx, d testing the string
Run2
DECLARE @str AS VARCHAR(1000) = 'A very A long string, d D testing the string'
SELECT @str InputString , dbo.RemoveDupicateFromStri
Output
-----
InputString AfterRemovingDuplicates
A very A long string, d D testing the string A very long string, d testing the
Run3
DECLARE @str AS VARCHAR(1000) = 'A very A long string. d D testing the string'
SELECT @str InputString , dbo.RemoveDupicateFromStri
Output
-----
InputString AfterRemovingDuplicates
A very A long string. d D testing the string A very long string. d testing the
Run4
DECLARE @str AS VARCHAR(1000) = 'Pawan Pawan'
SELECT @str InputString , dbo.RemoveDupicateFromStri
Output
-----
InputString AfterRemovingDuplicates
Pawan Pawan Pawan
Hope it helps !!
One thing I forgot to mention is that we have SQL 2005 in place.This means no CTE and no ROW_NUMBER
Unfortunally Microsoft never released a SPLIT function until MSSQL 2016 so you'll need to search in the internet for some already made split functions. Here you have some so suit yourself.
Ok, I made a testing using the first function in the article I've linked above:
CREATE FUNCTION dbo.Split
(
@RowData nvarchar(2000),
@SplitOn nvarchar(5)
)
RETURNS @RtnValue table
(
Id int identity(1,1),
Data nvarchar(100)
)
AS
BEGIN
Declare @Cnt int
Set @Cnt = 1
While (Charindex(@SplitOn,@RowData)>0)
Begin
Insert Into @RtnValue (data)
Select
Data = ltrim(rtrim(Substring(@RowData,1,Charindex(@SplitOn,@RowData)-1)))
Set @RowData = Substring(@RowData,Charindex(@SplitOn,@RowData)+1,len(@RowData))
Set @Cnt = @Cnt + 1
End
Insert Into @RtnValue (data)
Select Data = ltrim(rtrim(@RowData))
Return
END
With that function I tested against your example:DECLARE @LongString VARCHAR(MAX) = 'A very long string, testing the string'
SELECT Data Word, Min(Id) Pos
INTO #Words
FROM split(@LongString, ' ')
GROUP BY Data
SELECT LTRIM(STUFF((
SELECT ' ' + Word
FROM #Words
ORDER BY Pos
FOR XML PATH('')),1,1,''))
DROP TABLE #Words
It only works if you don't have the comma next to 'string,'. So you'll need to get rid the comma before calling the split function. Something like this:DECLARE @LongString VARCHAR(MAX) = 'A very long string, testing the string'
SELECT Data Word, Min(Id) Pos
INTO #Words
FROM split(REPLACE(@LongString,',',''), ' ')
GROUP BY Data
SELECT LTRIM(STUFF((
SELECT ' ' + Word
FROM #Words
ORDER BY Pos
FOR XML PATH('')),1,1,''))
DROP TABLE #Words
The trick here is to use the REPLACE function to replace all those unwanted characters. This example was for comma, but you can use a similar solution for other characters like ;:-" or whatever you can remember.
ASKER
Hi experts,
On the way to the office..will test those & get you posted..
Thanks,
Ben
On the way to the office..will test those & get you posted..
Thanks,
Ben
ASKER
@Pawan,
you forgot about the 2005 issue..?
this is what I get when trying to create your function
Msg 139, Level 15, State 1, Procedure RemoveDupicateFromString, Line 0
Cannot assign a default value to a local variable.
Msg 139, Level 15, State 1, Procedure RemoveDupicateFromString, Line 0
Cannot assign a default value to a local variable.
Msg 139, Level 15, State 1, Procedure RemoveDupicateFromString, Line 0
Cannot assign a default value to a local variable.
Msg 139, Level 15, State 1, Procedure RemoveDupicateFromString, Line 0
Cannot assign a default value to a local variable.
Msg 139, Level 15, State 1, Procedure RemoveDupicateFromString, Line 0
Cannot assign a default value to a local variable.
Msg 137, Level 15, State 2, Procedure RemoveDupicateFromString, Line 26
Must declare the scalar variable "@IgnoreChrs".
Msg 137, Level 15, State 2, Procedure RemoveDupicateFromString, Line 29
Must declare the scalar variable "@x1".
Msg 137, Level 15, State 2, Procedure RemoveDupicateFromString, Line 32
Must declare the scalar variable "@InputString".
Msg 137, Level 15, State 2, Procedure RemoveDupicateFromString, Line 38
Must declare the scalar variable "@InputString".
Msg 102, Level 15, State 1, Procedure RemoveDupicateFromString, Line 44
Incorrect syntax near 'r'.
Msg 137, Level 15, State 1, Procedure RemoveDupicateFromString, Line 46
Must declare the scalar variable "@finalString".
Msg 137, Level 15, State 2, Procedure RemoveDupicateFromString, Line 59
Must declare the scalar variable "@finalString".
@Vitor
Re your solution, would it be possible to have all incorporated into one function (similar to pawan's example), so the only thing I have to do is use one line of code to call the eliminate dup function?
Thanks,
Ben
@Vitor,
you forgot about the 2005 issue..?
this is what I get when trying to create your function
Msg 139, Level 15, State 1, Procedure RemoveDupicateFromString, Line 0
Cannot assign a default value to a local variable.
Msg 139, Level 15, State 1, Procedure RemoveDupicateFromString, Line 0
Cannot assign a default value to a local variable.
Msg 139, Level 15, State 1, Procedure RemoveDupicateFromString, Line 0
Cannot assign a default value to a local variable.
Msg 139, Level 15, State 1, Procedure RemoveDupicateFromString, Line 0
Cannot assign a default value to a local variable.
Msg 139, Level 15, State 1, Procedure RemoveDupicateFromString, Line 0
Cannot assign a default value to a local variable.
Msg 137, Level 15, State 2, Procedure RemoveDupicateFromString, Line 26
Must declare the scalar variable "@IgnoreChrs".
Msg 137, Level 15, State 2, Procedure RemoveDupicateFromString, Line 29
Must declare the scalar variable "@x1".
Msg 137, Level 15, State 2, Procedure RemoveDupicateFromString, Line 32
Must declare the scalar variable "@InputString".
Msg 137, Level 15, State 2, Procedure RemoveDupicateFromString, Line 38
Must declare the scalar variable "@InputString".
Msg 102, Level 15, State 1, Procedure RemoveDupicateFromString, Line 44
Incorrect syntax near 'r'.
Msg 137, Level 15, State 1, Procedure RemoveDupicateFromString, Line 46
Must declare the scalar variable "@finalString".
Msg 137, Level 15, State 2, Procedure RemoveDupicateFromString, Line 59
Must declare the scalar variable "@finalString".
@Vitor
This means no CTE and no ROW_NUMBERI do have those functioning in my environment.
Re your solution, would it be possible to have all incorporated into one function (similar to pawan's example), so the only thing I have to do is use one line of code to call the eliminate dup function?
Thanks,
Ben
@Vitor,
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hi Pawan,
but when trying to test it I also need a way in 2005..currently getting below
DECLARE @str AS VARCHAR(1000) = 'A very A long string, d D testing the string'
SELECT @str InputString , dbo.RemoveDupicateFromStri ng(@str,'. |;|,')
Msg 139, Level 15, State 1, Line 0
Cannot assign a default value to a local variable.
Msg 137, Level 15, State 2, Line 2
Must declare the scalar variable "@str".
Thanks,
Ben
but when trying to test it I also need a way in 2005..currently getting below
DECLARE @str AS VARCHAR(1000) = 'A very A long string, d D testing the string'
SELECT @str InputString , dbo.RemoveDupicateFromStri
Msg 139, Level 15, State 1, Line 0
Cannot assign a default value to a local variable.
Msg 137, Level 15, State 2, Line 2
Must declare the scalar variable "@str".
Thanks,
Ben
Try this Sir,
DECLARE @str AS VARCHAR(1000)
SET @str = 'A very A long string, d D testing the string'
SELECT @str InputString , dbo.RemoveDupicateFromString(@str,'.|;|,')
ASKER
Pawan you got it!
Thanks to all participants.
Thanks to all participants.
Open in new window
Ouptut
RemovedDuplicateWords
A very long string, testing the