We help IT Professionals succeed at work.
Troubleshooting Question

# Display Duplicate Data with RGB Colors

on
48 Views
I have a large amount of data, >800,000 rows, with duplicates in the combined data from both column C and Column D. If the same data is in both Col C and Col D, the row is considered a duplicate.
Need to visually display the duplicates for research purposes. Only the data in C & D need to be colored.
Given the large number of duplicates, need to use RGB colors to expand the color options...below seems to provide good color scheme results.  Trying to balance speed and accuracy...How do I best do this with a large data set?

```            iR = WorksheetFunction.RandBetween(51, 255)
iG = WorksheetFunction.RandBetween(102, 255)
iB = WorksheetFunction.RandBetween(51, 255)
DuplicateCells.Interior.Color = RGB(iR, iG, iB)  'This is just an example```
Comment
Watch Question

## View Solution Only

Social distance - Don't touch your face - Wash your hands for 20 seconds
CERTIFIED EXPERT
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
How many of the same duplicate might you have?

Are you looking to have unique colors?

Commented:
I have had over 50,000 duplicates...Given this large number, trying to have all unique colors seems impractical, but yes, that was the initial intent.
Social distance - Don't touch your face - Wash your hands for 20 seconds
CERTIFIED EXPERT
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
I agree, and even if you could you wouldn't be visually able to tell them apart. What I was asking however was if you had any idea of the maximum number of the same duplicate.

Commented:
I have had up to 50 duplicates of the same data.
Social distance - Don't touch your face - Wash your hands for 20 seconds
CERTIFIED EXPERT
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
That's a problem because if, say, the average number of the same duplicate were 25, you'd need 32,000 colors.

How about sorting the data on columns C and D and placing some unique number in an unused column, so if column F were unused it might look like this.

Commented:
Oh, I'm sorry...all of the same duplicate will be displayed in one color. I know there will be the overlapping colors, due to the large number of duplicates, but put in random order, they will be separated.
Social distance - Don't touch your face - Wash your hands for 20 seconds
CERTIFIED EXPERT
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
I'm not sure what you are saying. If, say, rows 5 and 10 are duplicates could there be non-duplicates in-between?

Commented:
Yes, there could be.
Social distance - Don't touch your face - Wash your hands for 20 seconds
CERTIFIED EXPERT
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
OK then if rows 5 and 50,000 were duplicates and were given the same one of my estimate 32,000 needed colors you'll never recognize them as duplicates.

Commented:
I just tried it, and unless I perform a A-Z or Z-A filter, they will be separated. Which leaves doing the sort either at the beginning or at the end. At the beginning, leaves all the duplicates together....
CERTIFIED EXPERT

Commented:
try read my article: Handling Duplicate Rows In Excel

and look into sections:

2. Using Conditional Formatting
II ) With Conditional Formatting - Method 2

6. Move the Duplicate or Unique Rows to Another Sheet

see if that's useful for you?

Commented:
Yes, I will look at it.  Conditional formatting though is way to slow for really large datasets...could take hours.  Looking forward to reading it.
CERTIFIED EXPERT

Commented:

Commented:
The data is too dirty for the database...have already tried this.  Would like a visual - in place - solution.  But, will look at the article and see if something can be adapted from these insights.
Social distance - Don't touch your face - Wash your hands for 20 seconds
CERTIFIED EXPERT
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
In the attached workbook press Ctrl+Shift+D or run the ColorDupes macro you'll find in Module1. You'll need a Reference to Microsoft Scripting Runtime.
29190135.xlsm

Commented:
Martin...I like what I see so far!  Let me go and use it on the 800K.

Commented:
Martin...I did run into one issue.  After it ran for 20 minutes, on 600K, it was stopped on a:
`Run-time error, Autofilter method of Range class failed.`
This is where the code reads,
```For lngEntry = 1 To colDupes.Count
With rng
strParts = Split(colDupes(lngEntry),"|")
.Autofilter    -> This is where the Error occurred. ```

The colDupes.count = 14666, the strKey = 9999972292 | 0000000018 , which is at the end of the list at 604645.

Social distance - Don't touch your face - Wash your hands for 20 seconds
CERTIFIED EXPERT
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
Try this for me. Copy the last hundred rows or so into a new sheet and run the code. If you still get the error can you show me columns C and D of those hundred rows?

Commented:
I did the last 1,000 rows and came out with 75 duplicate C & D numbers, or 150 cells. Everything ran smoothly.

Commented:
When I looked at the data, I saw row 62745, then 543249.  Everything in between was 'hidden.'
Social distance - Don't touch your face - Wash your hands for 20 seconds
CERTIFIED EXPERT
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
Add this line just prior to the .AutoFilter line
``````ActiveSheet.AutoFilterMode = False
``````

Or better - attach a workbook that has the data with everything but columns C and D deleted or cleared. Hopefully that will eliminate any problems with you posting sensitive data.

Commented:
I will add the autofilter.  Cannot do the C & D data...got in trouble with risk compliance about an hour ago for attempting to do so.  But, is there something out of the ordinary I can look for?

Commented:
I just did the last 60K and it ran okay. 864 duplicate C & D cells, or 432 duplicates.
Social distance - Don't touch your face - Wash your hands for 20 seconds
CERTIFIED EXPERT
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
Did you do the whole thing with the new line of code?

Commented:
Not yet...was just in process..

Commented:
It's still running.  If I'm seeing the bottom and side scroll bars flashing, then we're still good...correct?

Commented:
After about 20 minutes, the scroll bars stopped, and I quit the code. The error actually occurs on the line below the autofilter, or strParts (0). In moving up one, the Autofilter also fails.  The error for this reads:
```   Runtime Error: -2147417848 (80010108)
Method Autofilter of Object Range failed.```
strParts (0) = "0263163882"  I'm going to search for this.
strKey is the same as before

Social distance - Don't touch your face - Wash your hands for 20 seconds
CERTIFIED EXPERT
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
What scrollbars are you referring to? Is your data in a table?

Do a Find & Select->Find->Find All for 0263163882 in column C. How many hits do you get?

Commented:
Yes, there are six occurrences. I grab those six and about thirty more and ran the code.  All six were paired in duplicates.  3 duplicate C & D numbers for 6 total cells.  It ran fine though.  I really like the code.

Commented:
Could there be just too many?
Social distance - Don't touch your face - Wash your hands for 20 seconds
CERTIFIED EXPERT
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
Try it one more time with this code. I added lines 37 and 53.
``````Sub ColorDupes()
Dim R As Integer
Dim G As Integer
Dim B As Integer
Dim colDupes As New Collection
Dim lngEntry As Long
Dim lngLastRow As Long
Dim lngRow As Long
Dim rng As Range
Dim strParts() As String
' Requires a reference to Microsoft Scripting Runtime
Dim dicDupes As Dictionary
Dim strKey As String

Set rng = Range("C1:D" & Cells(Rows.Count, "C").End(xlUp).Row)
Set dicDupes = CreateObject("Scripting.Dictionary")

With Application
.EnableEvents = False
.ScreenUpdating = False
.Calculation = xlCalculationManual
End With

For lngRow = 1 To Cells(Rows.Count, "C").End(xlUp).Row
On Error Resume Next
'    colDupes.Add Cells(lngRow, "C") & "|" & Cells(lngRow, "D"), CStr(Cells(lngRow, "C") & "|" & Cells(lngRow, "D"))
strKey = Cells(lngRow, "C") & "|" & Cells(lngRow, "D")
With dicDupes
If Err.Number <> 0 Then
End If
End With
On Error GoTo 0
Next

On Error Resume Next
For lngEntry = 1 To colDupes.Count
With rng
strParts = Split(colDupes(lngEntry), "|")
ActiveSheet.AutoFilterMode = False
.AutoFilter
.AutoFilter field:=1, Criteria1:=strParts(0)
.AutoFilter field:=2, Criteria1:=strParts(1)
R = WorksheetFunction.RandBetween(51, 255)
G = WorksheetFunction.RandBetween(102, 255)
B = WorksheetFunction.RandBetween(51, 255)
' This assumes the data has a one-row heading. If there is no heading then
' remove .Offset(1, 0).Resize(.Rows.Count - 1, 1)
.Offset(1, 0).Resize(.Rows.Count - 1, 2).SpecialCells(xlCellTypeVisible).Interior.Color = RGB(R, G, B)
End With
Next
On Error GoTo 0

rng.AutoFilter
With Application
.Calculation = xlCalculationAutomatic
.EnableEvents = True
.ScreenUpdating = True
End With
End Sub
``````
Social distance - Don't touch your face - Wash your hands for 20 seconds
CERTIFIED EXPERT
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
Could there be just too many?
The variables involved with the dictionary and collection are Long which can contain more than you have. I'm not aware of size limits on dictionaries or collections but I guess my answer has to be I don't know.

Commented:
Got it...

Commented:
It took about 12-14 minutes to run 100K, and it seemed to be okay. Maybe I stopped it to soon. If there is an error which generates a pop-up, will the vba editor still show "Running?"

Commented:
For 800K, it's about 105 minutes.  If we combined columns C & D in a helper column, and had the code only look at the contents in that one cell to determine duplicates, would this reduce the time at all?  I really like what we have now, but may need to reduce the time.
Social distance - Don't touch your face - Wash your hands for 20 seconds
CERTIFIED EXPERT
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
I assume so.

Are there any empty cells in columns C or D?

Commented:
For these first 100K, no, but there will be some blank cells in column D going forward.

Commented:
Martin...I  can accept what you have provided, as it is what I asked you to do...and really like it. It's the best I've seen doing it the way you have.  I can then submit another request for assistance in making it a little faster (combine C & D)

Commented:
If that is what is needed...I'm not sure.
Social distance - Don't touch your face - Wash your hands for 20 seconds
CERTIFIED EXPERT
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
Give me a few minutes.
Social distance - Don't touch your face - Wash your hands for 20 seconds
CERTIFIED EXPERT
Most Valuable Expert 2017
Distinguished Expert 2018
Commented:
Unlock this solution and get a sample of our free trial.
(No credit card required)

Commented:
Appreciate it Martin...I want to read your article as well.  Let me put this in and see how it goes.

Commented:
Martin...I may not have fully understood all of the code. In returning to the data, I had Col G with prefilled data, so I moved the Helper Column to Col K. Made all the changes from G to K, and changed the -4 to -8. But, I'm missing something else, because the cell coloring is all off.  What other change should I have made?
Social distance - Don't touch your face - Wash your hands for 20 seconds
CERTIFIED EXPERT
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
I believe that was my fault. When I posted my most recent code the name of the sub was ColorDupe rather than ColorDupes. If you make that change in your code it will probably work. I'ver attached my workbook that works with column K as the helper column.
29190135a.xlsm

Commented:
Got it...let me go test..

Commented:
Martin, I'm having a tough time running it with the workbook you provided.  It runs all the way through, errrors out on rng.Autofilter...but nothing occurred in cols C & D and nothing is added to K...no Fred | Murtz or Ricky | Ricardo...  I looked at my references and I have MS Scripting Runtime activated.
Social distance - Don't touch your face - Wash your hands for 20 seconds
CERTIFIED EXPERT
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
The workbook I posted already has data (via a formula) in K. I've attached a video of me running it. The red circle flickers at the moment when I press Ctrl+Shift+D.https://www.experts-exchange.com/questions/29190135/Display-Duplicate-Data-with-RGB-Colors.html#
2020-07-31_15-27-20--1-.mp4

Commented:
Do I need to put a formula in K
`=C2 & "|" & D2`
Social distance - Don't touch your face - Wash your hands for 20 seconds
CERTIFIED EXPERT
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
Yes, sorry I wasn't clearer.

Commented:
That's fine...I'm glad it was something simple as this...Yes, it works on a 10K workbook I have...I believe we are good.  This is exactly what I was looking for.  I believe you have an excellent 2 variable / criteria duplicate finder in a multi-color format...even if one of the options is a helper..  Thanks Martin.  I'm looking forward to trying it at work.
Social distance - Don't touch your face - Wash your hands for 20 seconds
CERTIFIED EXPERT
Most Valuable Expert 2017
Distinguished Expert 2018

Commented:
Can I ask you how it is that you have 800,000 rows? I ask because if it's done incrementally, it might be faster to check for and possibly delete the duplicates before they are loaded into the workbook.

In any case you’re welcome and I’m glad I was able to help.

If you expand the “Full Biography" section of my profile you’ll find links to some articles I’ve written that may interest you.

Marty - Microsoft MVP 2009 to 2017
Experts Exchange Most Valuable Expert (MVE) 2015, 2017
Experts Exchange Top Expert Visual Basic Classic 2012 to 2019
Experts Exchange Top Expert VBA 2018, 2019
Experts Exchange Distinguished Expert in Excel 2018

Commented:
Martin, the 800,000+ are from a Mainframe download to a text file which I then import into Excel. At the moment, it is not 100% certain that the duplicates from C & D do signify duplicates. This is the reason for coloring those appearing to be duplicates.  If indeed they are duplicates, I can use the Data->remove duplicates function on the menu. What you wrote today helps to bridge the gap between identification and deletion.
Unlock the solution to this question.