John Carney
asked on
Macro that will remove duplicate strings in a cell using RegEx
This macro removes duplicate strings that occur consecutively. But how do I write it so that it removes all duplicate alphanumerics? In other words in this text it will remove the non-consecutive second occurrence of Text2 as well as both gratuitous occurrences of 33.
"Text1 33 33 Text2 44 Text3 55 Text2 33 "
Thanks,
John
(Dave, if you're reading this, I'm sure you will recognize your macro from quite a while ago!)
"Text1 33 33 Text2 44 Text3 55 Text2 33 "
Thanks,
John
(Dave, if you're reading this, I'm sure you will recognize your macro from quite a while ago!)
Sub LoseDupes()
Dim RegEx As VBScript_RegExp_55.RegExp
Dim Myrange As Range, C As Range
Set RegEx = New VBScript_RegExp_55.RegExp
Set Myrange = ActiveSheet.Range("B3:B6")
With RegEx
.MultiLine = False
.Global = True
.IgnoreCase = True
.Pattern = "\b(\w+)\b(\s+\1\b)+"
End With
For Each C In Myrange
C.Offset(0, 1) = RegEx.Replace(C, "$1")
Next
Set Myrange = Nothing
Set RegEx = Nothing
End Sub
Note that I changed both the pattern, and the replacement.
ASKER
Thanks Terry, I had a feeling it would take both changes. The only problem now is that there will often be a comma between the two instances. Sometimes with words and other punctuation as well. I tried a bunch of variations but couldn't get it. How do I wtrite that?
Also, and this is probably much more complicated and might require a separate question, is it possible to to remove duplicates when:
1) one is part of a larger word: as in 7ABCDEF, 7A
2) the number and the letter are in the same "word" but not adjacent: as in 7ABCDEF, 7C
Oh, and it will almost always be alphanumeric duplicates that I want to remove: 7A, 33D, etc.
Please don't spend much time on this. Just let me know if it's possible and then I'll post another question.
BTW, I'm still using a pattern you gave me last year and it still works great!
Thanks,
John
Also, and this is probably much more complicated and might require a separate question, is it possible to to remove duplicates when:
1) one is part of a larger word: as in 7ABCDEF, 7A
2) the number and the letter are in the same "word" but not adjacent: as in 7ABCDEF, 7C
Oh, and it will almost always be alphanumeric duplicates that I want to remove: 7A, 33D, etc.
Please don't spend much time on this. Just let me know if it's possible and then I'll post another question.
BTW, I'm still using a pattern you gave me last year and it still works great!
Thanks,
John
ASKER
Just to clarify there may or may not be a comma.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks, Terry this is great. Here's a link to my followup question: https://www.experts-exchange.com/questions/27247528/Removing-duplicate-strings-when-one-of-the-strings-is-a-whole-word-and-the-other-is-embedded-in-a-longer-string.html
Incidentally, I notice that all three of these work:
.Pattern = "\b(\w+)\b(.*?)\b([\s,]+\1 \b)+"
.Pattern = "\b(\w+)\b(.*?)\b([\s,]?\1 \b)+"
.Pattern = "\b(\w+)\b(.*?)\b([\s,*]\1 \b)+""
Are they really interchangeable? Or are there hidden deficiencies in the second and third?
And if I'm not pushing my luck (and please tell me if I am, I will gladly put this stuff in separate questions)), what is the function of the \1\? I tried other numbers and they either don't work at all or do things like remove all spaces.
And finally, what is the mechanism of "$1$2" in RegEx.Replace(C, "$1$2"). In trying to understand its function, I tried other numbers and that didn't tell me anything helpful.
Thanks,
John
Incidentally, I notice that all three of these work:
.Pattern = "\b(\w+)\b(.*?)\b([\s,]+\1
.Pattern = "\b(\w+)\b(.*?)\b([\s,]?\1
.Pattern = "\b(\w+)\b(.*?)\b([\s,*]\1
Are they really interchangeable? Or are there hidden deficiencies in the second and third?
And if I'm not pushing my luck (and please tell me if I am, I will gladly put this stuff in separate questions)), what is the function of the \1\? I tried other numbers and they either don't work at all or do things like remove all spaces.
And finally, what is the mechanism of "$1$2" in RegEx.Replace(C, "$1$2"). In trying to understand its function, I tried other numbers and that didn't tell me anything helpful.
Thanks,
John
.Pattern = "\b(\w+)\b(.*?)\b([\s,]?\1 \b)+"
Doesn't require a leading space or comma before the 2nd occurrence, so it will match:
A7B #A7B 123
And give result:
A7B # 123
.Pattern = "\b(\w+)\b(.*?)\b([\s,*]\1 \b)+""
Will allow a leading * instead of a space or comma, but won't allow more than one of those characters (the + is removed), and still needs a word boundary before it, so it will match:
A7B ABC*A7B 123 or A7B ABC,A7B 123
And give result:
A7B ABC 123
But
A7B ABC, A7B 123
won't match unless you add the + back in. You might have meant to put the * outside the square brackets?
\1 is a backreference to the group captured by the round brackets, (\w+) so it detects a repeat of the word
$1 and $2 insert the first and second captured groups as a replacement, (\w+) and (.*?)
Doesn't require a leading space or comma before the 2nd occurrence, so it will match:
A7B #A7B 123
And give result:
A7B # 123
.Pattern = "\b(\w+)\b(.*?)\b([\s,*]\1
Will allow a leading * instead of a space or comma, but won't allow more than one of those characters (the + is removed), and still needs a word boundary before it, so it will match:
A7B ABC*A7B 123 or A7B ABC,A7B 123
And give result:
A7B ABC 123
But
A7B ABC, A7B 123
won't match unless you add the + back in. You might have meant to put the * outside the square brackets?
\1 is a backreference to the group captured by the round brackets, (\w+) so it detects a repeat of the word
$1 and $2 insert the first and second captured groups as a replacement, (\w+) and (.*?)
Open in new window