x
• Status: Solved
• Priority: Medium
• Security: Public
• Views: 312

# Excel to find identical words

I have a excel sheet with a column with such data

www.site.com\somename\this-is-the-page
www.site.com\somename\this-is-the-page-thats-is-identical
www.site.com\somename\this-is-the-page-could-be-partial
www.site.com\somename\some-page-like-this

First 3 URL's have 4 identical words next to each other
4th = 0
5th and 6th has 5 words identical

Can i have help with a excel macro to find identical url's
0
mtthompsons
• 17
• 14
1 Solution

Older than dirtCommented:
1) Please further define "identical". In other words if there is even just one word the same, is it then "identical"?

2) Do the words that may make it identical only come from the portion of the url after the last backslash?

3) In your example, all the identical urls follow each other. Is that a requirement? In other words assuming that the data was as follows, would 1 and 3 still be identical?

www.site.com\somename\this-is-the-page
http://www.experts-exchange.com/Software/Office_Productivity
www.site.com\somename\this-is-the-page-thats-is-identical

4) Do you want to know how many identical words there are?

5) How do want to identify the identical urls?
0

Author Commented:
1. Sorry it should be 3+ words that are adjacent to each other

2. Yes, we need to check only after last slash between the hyphens -

3. Yes 1 and 3 would be identical in your example

4. As of now we will target 3 and if i have a easy way to change 3 to 4 or 2 would be great so i can switch accordingly to get my results

5. If i can get the results in column "B" as Identical so i can sort column "A" and "B" and have the results next to each other
0

Older than dirtCommented:
Thanks. Another question. In your example all the "words" are separated by dashes. Is that how it is in real life, or are they separated by spaces, and/or separated by other characters, and/or might there be punctuation marks?
0

Author Commented:
99% would be - dashes
Some cases would have some other special characters in such cases if code cannot handle rather can skip those url's
0

Older than dirtCommented:
<deleted>
0

Older than dirtCommented:
Here's a workbook with a partial solution. The code is shown below. Run the FindIdentical macro. When you do you'll see what I find are the numbers of matching words in each URL. I know that that's not your desired result but I'm not sure what you want me to put in column B instead. Should all the value of all the cells from B1 to B6 with the exception of B4 be "Identical"? Do you want to have the code do the sorting? If so should it be sorted on just column B or on columns A and B? Also please look at line 79 where you would add whatever characters should be treated as separators in addition to the dashes.

``````Option Explicit

Sub FindIdentical()
Dim lngLastRow As Long
Dim lngRow1 As Long
Dim lngRow2 As Long
Dim strWords1() As String
Dim strWords2() As String
Dim intPos As Integer
Dim intMinMatch As Integer
Dim intMatching As Integer
Dim lngIndex1 As Long
Dim lngIndex2 As Long
Dim colWords1 As Collection
Dim colWords2 As Collection

Range("B1").EntireColumn.ClearContents

intMinMatch = InputBox("Please enter the number of words that have to match in order to make an URL identical", "How many words?", 3)
lngLastRow = Range("A65536").End(xlUp).Row

For lngRow1 = 1 To lngLastRow
' Break up the words in the last part of the URL into a collection
' of unique words
intPos = InStrRev(Cells(lngRow1, 1).Value, "\")
strWords1 = Split(ConvertSeparators(Mid(Cells(lngRow1, 1).Value, intPos + 1)), "-")
Set colWords1 = New Collection
On Error Resume Next
For lngIndex = 0 To UBound(strWords1)
If Err.Number = 457 Then
' The word is already in the collection
Err.Clear
End If
Next
On Error GoTo 0
' Compare the value in lngRow1 with all the rows below it
For lngRow2 = lngRow1 + 1 To lngLastRow
intMatching = 0
intPos = InStrRev(Cells(lngRow2, 1).Value, "\")
strWords2 = Split(ConvertSeparators(Mid(Cells(lngRow2, 1).Value, intPos + 1)), "-")
Set colWords2 = New Collection
On Error Resume Next
For lngIndex = 0 To UBound(strWords2)
If Err.Number = 457 Then
Err.Clear
End If
Next
' Compare the words. (Colletions start at index 1)
For lngIndex1 = 1 To colWords1.Count
For lngIndex2 = 1 To colWords2.Count
If lngIndex1 <= colWords2.Count Then
If colWords1(lngIndex1) = colWords2(lngIndex2) Then
intMatching = intMatching + 1
Exit For
End If
End If
Next
Next
If intMatching >= intMinMatch Then
If Cells(lngRow1, 2).Value = "" Then
Cells(lngRow1, 2).Value = "This URL has " & intMatching & " words that match words in " & Cells(lngRow2, 1).Value
Else
Cells(lngRow1, 2).Value = Cells(lngRow1, 2).Value & vbCrLf & " AND " & intMatching & " words that match words in " & Cells(lngRow2, 1).Value
End If
End If
Next
Next
End Sub

Private Function ConvertSeparators(strWords As String) As String
' This function changes all listed separators to dashes

Dim varSeps As Variant
Dim lngIndex As Long
' the # and [ as examples of how you'd add them
varSeps = Array("#", "[")

For lngIndex = 0 To UBound(varSeps)
strWords = Replace(strWords, varSeps(lngIndex), "-")
Next

ConvertSeparators = strWords
End Function
``````
Q-28274528.xlsm
0

Author Commented:
Thanks
As in the attachment i get an error
The column "B" looks fine but will test it with actual data once we get it running and confirm back
Error.JPG
0

Author Commented:
I get this error with the sample data as well that you attached

on line
For lngIndex = 0 To UBound(strWords1)

I will have 50K rows of data just if that needs some change in the code
0

Author Commented:
One more part is the actual data slashes would be as

http://www.Sample.com/path/This-is-a-sample-page

In the example i gave the slashes as \

if that's something embedded in the code
Sorry for that
0

Older than dirtCommented:
I've attached an updated workbook.
Q-28274528.xlsm
0

Author Commented:
Sorry for the delay i placed 40K URL's which took close to a day to complete and did not want to kill the results by breaking the macro from running
Works great but have an issue with the report

Column "B" can we have numbers if 2 are duplicate to pair those 2 lets place number (1)

Then if next one has a duplicate with 4 URL's then lets place (2) 4 times

So when i sort all 1,2,3,4's are next to each other and i can easy check the column "A" URL's
0

Older than dirtCommented:
Replace your FindIdentical sub with the code below and tell me if it is faster. It still shows the same things in column B because I'm not sure what you want there.

You say
Column "B" can we have numbers if 2 are duplicate to pair those 2 lets place number (1)

Then if next one has a duplicate with 4 URL's then lets place (2) 4 times
Should I use 3 for the next duplicate, and then 4 and then 5, etc?

Also let's say rows 1, 100, 105 and 106 are duplicates. I assume that those rows all get a 1. But what if I find that row 2 is a duplicate of rows, 6, 105 and 120. I assume then that rows 2, 6
and 120 get a 2, but what about row 105? Does it stay a 1 or do I make it a 2?

``````Sub FindIdentical()
Dim lngLastRow As Long
Dim lngRow1 As Long
Dim lngRow2 As Long
Dim strWords1() As String
Dim strWords2() As String
Dim intPos As Integer
Dim intMinMatch As Integer
Dim intMatching As Integer
Dim lngIndex1 As Long
Dim lngIndex2 As Long
Dim colWords1 As Collection
Dim colWords2 As Collection
Dim varData As Variant
Dim lngMarker As Long

Range("B1").EntireColumn.ClearContents

intMinMatch = InputBox("Please enter the number of words that have to match in order to make an URL identical", "How many words?", 3)
lngLastRow = Range("A65536").End(xlUp).Row
varData = Range("A1:B" & lngLastRow).Value

For lngRow1 = 1 To lngLastRow
' Break up the words in the last part of the URL into a collection
' of unique words
intPos = InStrRev(varData(lngRow1, 1), "/")
strWords1 = Split(ConvertSeparators(Mid(varData(lngRow1, 1), intPos + 1)), "-")
Set colWords1 = New Collection
On Error Resume Next
For lngIndex1 = 0 To UBound(strWords1)
Debug.Print UBound(strWords1)
If Err.Number = 457 Then
' The word is already in the collection
Err.Clear
End If
Next
On Error GoTo 0
' Compare the value in lngRow1 with all the rows below it
For lngRow2 = lngRow1 + 1 To lngLastRow
intMatching = 0
intPos = InStrRev(varData(lngRow2, 1), "/")
strWords2 = Split(ConvertSeparators(Mid(varData(lngRow2, 1), intPos + 1)), "-")
Set colWords2 = New Collection
On Error Resume Next
For lngIndex2 = 0 To UBound(strWords2)
If Err.Number = 457 Then
Err.Clear
End If
Next
' Compare the words. (Colletions start at index 1)
For lngIndex1 = 1 To colWords1.Count
For lngIndex2 = 1 To colWords2.Count
If lngIndex1 <= colWords2.Count Then
If colWords1(lngIndex1) = colWords2(lngIndex2) Then
intMatching = intMatching + 1
Exit For
End If
End If
Next
Next
If intMatching >= intMinMatch Then
If varData(lngRow1, 2) = "" Then
varData(lngRow1, 2) = "This URL has " & intMatching & " words that match words in " & varData(lngRow2, 1)
Else
varData(lngRow1, 2) = varData(lngRow1, 2) & vbCrLf & " AND " & intMatching & " words that match words in " & varData(lngRow2, 1)
End If
End If
Next
Next

Range("A1:B" & lngLastRow).Value = varData
End Sub
``````
0

Author Commented:
Yes you are right i just want it as you mentioned and if its multiple times then we can have 1,6,8 and so on

So first i can work on those duplicate URL's and then have the sort work perfectly
0

Older than dirtCommented:
I'm sorry but I don't know what that means and you didn't answer both my questions so let me ask in a different way

1) Does the first set of duplicates get a 1 and the second set a 2 and the third set a 3 and the fourth set a 4, etc?

2) If row 1 matches row 100, both get a 1. What happens if later row 30 also matches row 100? Does row 100 stay a 1 or does it get anther number?
0

Older than dirtCommented:
I also have another idea to make it faster but before I try it can you tell me what the largest number of words there might be in one URL?
0

Author Commented:
Max 15 words and not more
0

Older than dirtCommented:
0

Author Commented:
Sorry missed that part

1) Does the first set of duplicates get a 1 and the second set a 2 and the third set a 3 and the fourth set a 4, etc?

Yes say row 1,5,10 are same then they get 1
Say 2,3,4 are same they get 2
Say 3,7,8,9 are same then it gets 3

As 3 is mentioned in 2 as well we get 2,3 in column "B"

2) If row 1 matches row 100, both get a 1. What happens if later row 30 also matches row 100? Does row 100 stay a 1 or does it get anther number?

Yes so we have 2 numbers as mentioned in point 1

Please let me know if you have any questions or any other easier ways to get this done

Goal : Identify URL's that are very closely similar to others
0

Older than dirtCommented:
Thanks.
0

Older than dirtCommented:
What I have now is still too slow. I will continue to work on it and will keep you posted on any progress I make.
0

Author Commented:
Thanks
0

Older than dirtCommented:
I'm sorry but this is the best I can do (at least for now but I'll continue to think about it) but it's still very slow.

What this code does is
1) Transfers all the data from the sheet into a variant array
2) Finds all the unique words and puts then in columns 3 to 17 of the array
3) Compares the words in each row with the words in all the rows below it and puts the match count in column 2 of the array
4) Transfers columns 1 and 2 of the array back to the sheet.

Steps 1 and 4 are very fast and step 2 is reasonably fast, but step 3 is very slow since with 40,000 rows an average of 20,000 comparisons are done for each row and that's 800,000,000 comparisons!

``````Sub FindIdentical()
Dim lngLastRow As Long
Dim lngRow1 As Long
Dim lngRow2 As Long
Dim strWords1() As String
Dim intPos As Integer
Dim intMinMatch As Integer
Dim intMatching As Integer
Dim lngIndex1 As Long
Dim lngIndex2 As Long
Dim colWords1 As Collection
Dim colWords2 As Collection
Dim varData As Variant
Dim lngMarker As Long

intMinMatch = InputBox("Please enter the number of words that have to match in order to make an URL identical", "How many words?", 3)
lngLastRow = Range("A65536").End(xlUp).Row
' Transfer all the data to an array. The data is only
' in columns A and B, but Q is used to leave room
' for 15 additional columns for the words
varData = Range("A1:Q" & lngLastRow).Value
Debug.Print Now

For lngRow1 = 1 To lngLastRow
' Split up all the words
intPos = InStrRev(varData(lngRow1, 1), "/")
strWords1 = Split(ConvertSeparators(Mid(varData(lngRow1, 1), intPos + 1)), "-")
Set colWords1 = New Collection
On Error Resume Next
For lngIndex1 = 0 To UBound(strWords1)
If Err.Number = 457 Then
' The word is already in the collection
Err.Clear
End If
Next

intPos = 3
For lngIndex1 = 1 To colWords1.Count
varData(lngRow1, intPos) = colWords1(lngIndex1)
intPos = intPos + 1
Next
On Error GoTo 0
Next

For lngRow1 = 1 To lngLastRow
Debug.Print lngRow1 & " - " & Now
' Compare the words in lngRow1 with the words in all the rows below it
For lngRow2 = lngRow1 + 1 To lngLastRow
intMatching = 0
For lngIndex1 = 3 To 17
If varData(lngRow1, lngIndex1) = "" Then
Exit For
Else
If varData(lngRow1, lngIndex1) = varData(lngRow2, 3) Or _
varData(lngRow1, lngIndex1) = varData(lngRow2, 4) Or _
varData(lngRow1, lngIndex1) = varData(lngRow2, 5) Or _
varData(lngRow1, lngIndex1) = varData(lngRow2, 6) Or _
varData(lngRow1, lngIndex1) = varData(lngRow2, 7) Or _
varData(lngRow1, lngIndex1) = varData(lngRow2, 8) Or _
varData(lngRow1, lngIndex1) = varData(lngRow2, 9) Or _
varData(lngRow1, lngIndex1) = varData(lngRow2, 10) Or _
varData(lngRow1, lngIndex1) = varData(lngRow2, 11) Or _
varData(lngRow1, lngIndex1) = varData(lngRow2, 12) Or _
varData(lngRow1, lngIndex1) = varData(lngRow2, 13) Or _
varData(lngRow1, lngIndex1) = varData(lngRow2, 14) Or _
varData(lngRow1, lngIndex1) = varData(lngRow2, 15) Or _
varData(lngRow1, lngIndex1) = varData(lngRow2, 16) Or _
varData(lngRow1, lngIndex1) = varData(lngRow2, 17) Then
intMatching = intMatching + 1
End If
End If
Next

If intMatching >= intMinMatch Then
If varData(lngRow1, 2) = "" Then
varData(lngRow1, 2) = lngRow1 & ", " & lngRow2
Else
varData(lngRow1, 2) = varData(lngRow1, 2) & ", " & lngRow2
End If
End If
Next
Next

Range("A1:B" & lngLastRow).Value = varData

Debug.Print Now
End Sub
``````
0

Author Commented:
Thanks but i have one logical issue

I got this as results for one
33, 20133, 25805
20133, 25805

25805 had no message in column "B" now the problem is i need to browse manually to these rows to find its duplicate

Can we color all cells that has a duplicate or mention the numbers on the other one as well

Else too much of manual work would be needed

Sorry to keep asking to improve
I can live with the time taken to execute
0

Older than dirtCommented:
EDIT:  See the next post first.

Here's a modification that I believe fixes the problem. Try it on a thousand or so rows subset of your data and tell me if it works.

``````Sub FindIdentical()
Dim lngLastRow As Long
Dim lngRow1 As Long
Dim lngRow2 As Long
Dim strWords1() As String
Dim intPos As Integer
Dim intMinMatch As Integer
Dim intMatching As Integer
Dim lngIndex1 As Long
Dim lngIndex2 As Long
Dim colWords1 As Collection
Dim colWords2 As Collection
Dim varData As Variant
Dim lngLastMatch As Long

intMinMatch = InputBox("Please enter the number of words that have to match in order to make an URL identical", "How many words?", 3)
lngLastRow = Range("A65536").End(xlUp).Row
Columns("B:B").ClearContents

' Transfer all the data to an array. The data is only
' in columns A and B, but Q is used to leave room
' for 15 additional columns for the words
varData = Range("A1:Q" & lngLastRow).Value
'Debug.Print Now
For lngRow1 = 1 To lngLastRow
' Split up all the words
intPos = InStrRev(varData(lngRow1, 1), "/")
strWords1 = Split(ConvertSeparators(Mid(varData(lngRow1, 1), intPos + 1)), "-")
Set colWords1 = New Collection
On Error Resume Next
For lngIndex1 = 0 To UBound(strWords1)
If Err.Number = 457 Then
' The word is already in the collection
Err.Clear
End If
Next

intPos = 3
For lngIndex1 = 1 To colWords1.Count
varData(lngRow1, intPos) = colWords1(lngIndex1)
intPos = intPos + 1
Next
On Error GoTo 0
Next

For lngRow1 = 1 To lngLastRow
'Debug.Print lngRow1 & " - " & Now
' Compare the words in lngRow1 with the words in all the rows below it
For lngRow2 = lngRow1 + 1 To lngLastRow
intMatching = 0
For lngIndex1 = 3 To 17
If varData(lngRow1, lngIndex1) = "" Then
Exit For
Else
If varData(lngRow1, lngIndex1) = varData(lngRow2, 3) Or _
varData(lngRow1, lngIndex1) = varData(lngRow2, 4) Or _
varData(lngRow1, lngIndex1) = varData(lngRow2, 5) Or _
varData(lngRow1, lngIndex1) = varData(lngRow2, 6) Or _
varData(lngRow1, lngIndex1) = varData(lngRow2, 7) Or _
varData(lngRow1, lngIndex1) = varData(lngRow2, 8) Or _
varData(lngRow1, lngIndex1) = varData(lngRow2, 9) Or _
varData(lngRow1, lngIndex1) = varData(lngRow2, 10) Or _
varData(lngRow1, lngIndex1) = varData(lngRow2, 11) Or _
varData(lngRow1, lngIndex1) = varData(lngRow2, 12) Or _
varData(lngRow1, lngIndex1) = varData(lngRow2, 13) Or _
varData(lngRow1, lngIndex1) = varData(lngRow2, 14) Or _
varData(lngRow1, lngIndex1) = varData(lngRow2, 15) Or _
varData(lngRow1, lngIndex1) = varData(lngRow2, 16) Or _
varData(lngRow1, lngIndex1) = varData(lngRow2, 17) Then
intMatching = intMatching + 1
End If
End If
Next

If intMatching >= intMinMatch Then
lngLastMatch = lngRow2
If varData(lngRow1, 2) = "" Then
varData(lngRow1, 2) = lngRow1 & ", " & lngRow2
Else
varData(lngRow1, 2) = varData(lngRow1, 2) & ", " & lngRow2
End If
End If
Next
varData(lngLastMatch, 2) = "'" & lngLastMatch
Next

Range("A1:B" & lngLastRow).Value = varData

'Debug.Print Now
End Sub
``````
0

Older than dirtCommented:
I made a change to the above that in a short test (letting it run through the first 50 rows of a 40,000 row sheet) seems to be about 20 percent faster. Unless there's still a problem I think this is the best I can do.

``````Sub FindIdentical()
Dim lngLastRow As Long
Dim lngRow1 As Long
Dim lngRow2 As Long
Dim strWords1() As String
Dim intPos As Integer
Dim intMinMatch As Integer
Dim intMatching As Integer
Dim lngIndex1 As Long
Dim lngIndex2 As Long
Dim colWords1 As Collection
Dim colWords2 As Collection
Dim varData As Variant
Dim lngMarker As Long

intMinMatch = InputBox("Please enter the number of words that have to match in order to make an URL identical", "How many words?", 3)
lngLastRow = Range("A65536").End(xlUp).Row
Columns("B:B").ClearContents

' Transfer all the data to an array. The data is only
' in columns A and B, but Q is used to leave room
' for 15 additional columns for the words
varData = Range("A1:Q" & lngLastRow).Value
'Debug.Print Now

For lngRow1 = 1 To lngLastRow
' Split up all the words
intPos = InStrRev(varData(lngRow1, 1), "/")
strWords1 = Split(ConvertSeparators(Mid(varData(lngRow1, 1), intPos + 1)), "-")
Set colWords1 = New Collection
On Error Resume Next
For lngIndex1 = 0 To UBound(strWords1)
If Err.Number = 457 Then
' The word is already in the collection
Err.Clear
End If
Next

intPos = 3
For lngIndex1 = 1 To colWords1.Count
varData(lngRow1, intPos) = colWords1(lngIndex1)
intPos = intPos + 1
Next
On Error GoTo 0
Next

For lngRow1 = 1 To lngLastRow
'If lngRow1 Mod 10 = 0 Then Debug.Print lngRow1 & " - " & Now
' Compare the words in lngRow1 with the words in all the rows below it
For lngRow2 = lngRow1 + 1 To lngLastRow
intMatching = 0
For lngIndex1 = 3 To 17
If varData(lngRow1, lngIndex1) = "" Then
Exit For
Else
Select Case True
Case varData(lngRow1, lngIndex1) = varData(lngRow2, 3)
intMatching = intMatching + 1
Case varData(lngRow1, lngIndex1) = varData(lngRow2, 4)
intMatching = intMatching + 1
Case varData(lngRow1, lngIndex1) = varData(lngRow2, 5)
intMatching = intMatching + 1
Case varData(lngRow1, lngIndex1) = varData(lngRow2, 6)
intMatching = intMatching + 1
Case varData(lngRow1, lngIndex1) = varData(lngRow2, 7)
intMatching = intMatching + 1
Case varData(lngRow1, lngIndex1) = varData(lngRow2, 8)
intMatching = intMatching + 1
Case varData(lngRow1, lngIndex1) = varData(lngRow2, 9)
intMatching = intMatching + 1
Case varData(lngRow1, lngIndex1) = varData(lngRow2, 10)
intMatching = intMatching + 1
Case varData(lngRow1, lngIndex1) = varData(lngRow2, 11)
intMatching = intMatching + 1
Case varData(lngRow1, lngIndex1) = varData(lngRow2, 12)
intMatching = intMatching + 1
Case varData(lngRow1, lngIndex1) = varData(lngRow2, 13)
intMatching = intMatching + 1
Case varData(lngRow1, lngIndex1) = varData(lngRow2, 14)
intMatching = intMatching + 1
Case varData(lngRow1, lngIndex1) = varData(lngRow2, 15)
intMatching = intMatching + 1
Case varData(lngRow1, lngIndex1) = varData(lngRow2, 16)
intMatching = intMatching + 1
Case varData(lngRow1, lngIndex1) = varData(lngRow2, 17)
intMatching = intMatching + 1
End Select
End If
Next
If intMatching >= intMinMatch Then
If varData(lngRow1, 2) = "" Then
varData(lngRow1, 2) = lngRow1 & ", " & lngRow2
Else
varData(lngRow1, 2) = varData(lngRow1, 2) & ", " & lngRow2
End If
End If
Next
Next

Range("A1:B" & lngLastRow).Value = varData

'Debug.Print Now
End Sub
``````
0

Author Commented:
But still dont get the numbers on the other duplicate

8665, 8668

8665 row has the above 2 numbers but 8668 is blank in column "B"
0

Older than dirtCommented:
I'm sorry, I had several versions floating around and I lost the change that corrected the problem in the one I posted. Again try this on a small sample.

``````Sub FindIdentical()
Dim lngLastRow As Long
Dim lngRow1 As Long
Dim lngRow2 As Long
Dim strWords1() As String
Dim intPos As Integer
Dim intMinMatch As Integer
Dim intMatching As Integer
Dim lngIndex1 As Long
Dim lngIndex2 As Long
Dim colWords1 As Collection
Dim colWords2 As Collection
Dim varData As Variant
Dim lngMarker As Long
Dim lngLastMatch As Long

intMinMatch = InputBox("Please enter the number of words that have to match in order to make an URL identical", "How many words?", 3)
lngLastRow = Range("A65536").End(xlUp).Row
Columns("B:B").ClearContents

' Transfer all the data to an array. The data is only
' in columns A and B, but Q is used to leave room
' for 15 additional columns for the words
varData = Range("A1:Q" & lngLastRow).Value
'Debug.Print Now

For lngRow1 = 1 To lngLastRow
' Split up all the words
intPos = InStrRev(varData(lngRow1, 1), "/")
strWords1 = Split(ConvertSeparators(Mid(varData(lngRow1, 1), intPos + 1)), "-")
Set colWords1 = New Collection
On Error Resume Next
For lngIndex1 = 0 To UBound(strWords1)
If Err.Number = 457 Then
' The word is already in the collection
Err.Clear
End If
Next

intPos = 3
For lngIndex1 = 1 To colWords1.Count
varData(lngRow1, intPos) = colWords1(lngIndex1)
intPos = intPos + 1
Next
On Error GoTo 0
Next

For lngRow1 = 1 To lngLastRow
'If lngRow1 Mod 10 = 0 Then Debug.Print lngRow1 & " - " & Now
' Compare the words in lngRow1 with the words in all the rows below it
For lngRow2 = lngRow1 + 1 To lngLastRow
intMatching = 0
For lngIndex1 = 3 To 17
If varData(lngRow1, lngIndex1) = "" Then
Exit For
Else
Select Case True
Case varData(lngRow1, lngIndex1) = varData(lngRow2, 3)
intMatching = intMatching + 1
Case varData(lngRow1, lngIndex1) = varData(lngRow2, 4)
intMatching = intMatching + 1
Case varData(lngRow1, lngIndex1) = varData(lngRow2, 5)
intMatching = intMatching + 1
Case varData(lngRow1, lngIndex1) = varData(lngRow2, 6)
intMatching = intMatching + 1
Case varData(lngRow1, lngIndex1) = varData(lngRow2, 7)
intMatching = intMatching + 1
Case varData(lngRow1, lngIndex1) = varData(lngRow2, 8)
intMatching = intMatching + 1
Case varData(lngRow1, lngIndex1) = varData(lngRow2, 9)
intMatching = intMatching + 1
Case varData(lngRow1, lngIndex1) = varData(lngRow2, 10)
intMatching = intMatching + 1
Case varData(lngRow1, lngIndex1) = varData(lngRow2, 11)
intMatching = intMatching + 1
Case varData(lngRow1, lngIndex1) = varData(lngRow2, 12)
intMatching = intMatching + 1
Case varData(lngRow1, lngIndex1) = varData(lngRow2, 13)
intMatching = intMatching + 1
Case varData(lngRow1, lngIndex1) = varData(lngRow2, 14)
intMatching = intMatching + 1
Case varData(lngRow1, lngIndex1) = varData(lngRow2, 15)
intMatching = intMatching + 1
Case varData(lngRow1, lngIndex1) = varData(lngRow2, 16)
intMatching = intMatching + 1
Case varData(lngRow1, lngIndex1) = varData(lngRow2, 17)
intMatching = intMatching + 1
End Select
End If
Next
If intMatching >= intMinMatch Then
lngLastMatch = lngRow2
If varData(lngRow1, 2) = "" Then
varData(lngRow1, 2) = lngRow1 & ", " & lngRow2
Else
varData(lngRow1, 2) = varData(lngRow1, 2) & ", " & lngRow2
End If
End If
Next
varData(lngLastMatch, 2) = "'" & lngLastMatch
Next

Range("A1:B" & lngLastRow).Value = varData

'Debug.Print Now
End Sub
``````
0

Author Commented:
I get subscript out of range on line
varData(lngLastMatch, 2) = "'" & lngLastMatch

I have 40K rows of data

I tested with 500 and seems fine and on the bigger loaded file i got this error
0

Older than dirtCommented:
I can't imagine why that is happening. Did you modify the code at all? If not then try this for me. On a temporary basis change line 49 to

For lngRow1 = 39500 To lngLastRow

where 39500 is a number about 500 rows less then your total number of rows. If you still get the error
1) Click the 'Debug' button
2) Assuming that the varData(lngLastMatch, 2) = "'" & lngLastMatch line is now highlighted in yellow, place your cursor over either lngLastMatch in that line and tell me what the value is.

Is that value greater than the number of rows and if so by how many?
0

Author Commented:
lngLastMatch=0 is what i get when highlighted

I never changed the code and now when you asked i changed the row 49 and tried
0

Older than dirtCommented:
That doesn't make sense. Can you attach your project? If it's too big you can go to MediaFire, upload your workbook and then post the resulting URL here.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.