Go Premium for a chance to win a PS4. Enter to Win

x
Solved

# Array - remove duplicates efficiently and make two-dimensional

Posted on 2003-12-12
Medium Priority
2,280 Views
Last Modified: 2008-02-01
Hi and thanks for your time.

My problem:

My project creates single dimension string arrays with up to 10,000 items.

What I need to do, and have not been able to yet with any efficiency, is to turn this into a 2 dimenionsal array that has duplicates removed and has a count next to each string of the number of duplicate items.

For example:

arr(0) eq to "dog"
arr(1) eq to "dog"
arr(2) eq to "cat"
arr(3) eq to "horse"
arr(4) eq to "horse"
arr(5) eq to "horse"
arr(6) eq to "pig"

becomes:

arr(0,0) eq to "dog"    AND arr(0,1) eq to 2
arr(1,0) eq to "cat"    AND arr(0,1) eq to 1
arr(2,0) eq to "horse"  AND arr(0,1) eq to 3
arr(3,0) eq to "pig"    AND arr(0,1) eq to 1

Hope that's clear...
0
Question by:jamesK
7 Comments

LVL 4

Expert Comment

ID: 9929553
I would suggest that you create a second, 2dim array.

Then (pseudocode):
For each element in 1dim array
search 2dim array element 0 for occurance
if found
increment element 1
else
append 1dim element value to 2dim array in element 0
set element 1 =1
end if
next

If you could sort the first array, the process will go a lot faster.
0

LVL 86

Accepted Solution

Mike Tomlinson earned 2000 total points
ID: 9929721
This should be lightning quick.  It uses the collections ability to retrieve based upon key very quickly.  Adding and removing from a collection is very fast since it uses a linked list implementation.  My algorithm does not use a costly Redim Preserve either.

Regards,

Idle_Mind

Option Explicit

Private arr() As String
Private arr2() As Variant

Private Sub Form_Load()
ReDim arr(0 To 6)

arr(0) = "dog"
arr(1) = "dog"
arr(2) = "cat"
arr(3) = "horse"
arr(4) = "horse"
arr(5) = "horse"
arr(6) = "pig"
End Sub

Private Sub Command1_Click()
On Error GoTo noSuchItem

Dim itemName As Variant
Dim itemNames As Collection
Dim nameCount As String
Dim nameCounts As Collection
Dim a As Single

' build no duplicates list and counts
Set itemNames = New Collection
Set nameCounts = New Collection
For a = 0 To UBound(arr)
nameCount = nameCounts.item(arr(a)) ' attempt to get current count
nameCounts.Remove arr(a) ' remove current count

nameCount = CStr(Val(nameCount) + 1) ' increment count
nameCounts.Add nameCount, arr(a) ' put count it back in collection
Next a

' build 2 dimensional array
ReDim arr2(0 To itemNames.count - 1, 0 To 1)
a = 0
For Each itemName In itemNames
arr2(a, 0) = itemName
arr2(a, 1) = Val(nameCounts.item(itemName))
a = a + 1
Next itemName

For a = 0 To UBound(arr2)
Debug.Print arr2(a, 0), arr2(a, 1)
Next a

Exit Sub

noSuchItem: ' no count yet
itemNames.Add arr(a) ' add item to our list
nameCount = "0" ' set count to 0
nameCounts.Add nameCount, arr(a) ' add count to our collection
Resume Next
End Sub
0

LVL 1

Author Comment

ID: 9929964
thanks for the inputs. clothahump, i looked at that kind of methodology first, but even with a binary array search, it took way too long on larger arrays.

Idle, I also looked at using a collection, but not specifically how you are suggesting. I will give your code a go and see how it works.
0

LVL 2

Expert Comment

ID: 9931202
Hello,

what about this approach (it is without Redim Preserve for speedup)

Option Explicit
Dim flatArray(0 To 6) As String
Dim countArray(0 To 6, 0 To 1) As Variant

Sub loadElem()
flatArray(0) = "Dog"
flatArray(1) = "Dog"
flatArray(2) = "Cat"
flatArray(3) = "Horse"
flatArray(4) = "Horse"
flatArray(5) = "Horse"
flatArray(6) = "Pig"

Call countElem
End Sub

Sub countElem()
Dim n As Integer
Dim i As Integer
Dim j As Integer
Dim msg As String
Dim found As Integer
Dim sElement As String

n = 0
For i = 0 To UBound(flatArray)
sElement = flatArray(i)
found = False
For j = 0 To n
If (countArray(j, 0) = sElement) Then
countArray(j, 1) = countArray(j, 1) + 1
found = True
Exit For
End If
Next
If (found = False) Then
countArray(n, 0) = sElement
countArray(n, 1) = 1
n = n + 1
End If
Next
msg = ""
For i = 0 To n
msg = msg & countArray(i, 1) & ": " & countArray(i, 0) & Chr(13)
Next
MsgBox msg
End Sub

0

LVL 86

Expert Comment

ID: 9932416
I "benchmarked" the two algorithms and here are the results in seconds:

Words:      Idle_Mind   PurplePerls
1,000,000       82              110
500,000         39               54
100,000          8                11
50,000          4                  6
10,000          0                  1

Here is the program I used to benchmark:

Option Explicit

Private Const numWords As Single = 1000000

Private arr() As String
Private arr2() As Variant

Private flatArray(0 To numWords) As String
Private countArray(0 To numWords, 0 To 1) As Variant

Private startTime As Date
Private endTime As Date

Private Sub Form_Load()
Dim a As Single
Dim b As Byte
Dim w As String

ReDim arr(0 To numWords)

Debug.Print "Building Array..."
Randomize Timer
For a = 0 To numWords
w = ""
For b = 1 To 2 ' 26^2 = 676 Unique words available
w = w & Chr(Int((122 - 97 + 1) * Rnd + 97))
Next b
arr(a) = w
flatArray(a) = w
Next a
Debug.Print "Done"
End Sub

Private Sub Command1_Click() ' Idle_Mind
On Error GoTo noSuchItem

startTime = Now
Debug.Print "Start: " & startTime

Dim itemName As Variant
Dim itemNames As Collection
Dim nameCount As String
Dim nameCounts As Collection
Dim a As Single

' build no duplicates list and counts
Set itemNames = New Collection
Set nameCounts = New Collection
For a = 0 To UBound(arr)
nameCount = nameCounts.Item(arr(a)) ' attempt to get current count
nameCounts.Remove arr(a) ' remove current count

nameCount = CStr(Val(nameCount) + 1) ' increment count
nameCounts.Add nameCount, arr(a) ' put count it back in collection
Next a

' build 2 dimensional array
ReDim arr2(0 To itemNames.Count - 1, 0 To 1)
a = 0
For Each itemName In itemNames
arr2(a, 0) = itemName
arr2(a, 1) = Val(nameCounts.Item(itemName))
a = a + 1
Next itemName

endTime = Now
Debug.Print "End: " & endTime
Debug.Print "Duration: " & DateDiff("s", startTime, endTime)
Debug.Print "Unique Words: " & UBound(arr2) + 1
Exit Sub

noSuchItem: ' no count yet
itemNames.Add arr(a) ' add item to our list
nameCount = "0" ' set count to 0
nameCounts.Add nameCount, arr(a) ' add count to our collection
Resume Next
End Sub

Private Sub Command2_Click() ' PurplePerls
startTime = Now
Debug.Print "Start: " & startTime

Dim n As Single
Dim i As Single
Dim j As Single
Dim found As Boolean
Dim sElement As String

n = 0
For i = 0 To UBound(flatArray)
sElement = flatArray(i)
found = False
For j = 0 To n
If (countArray(j, 0) = sElement) Then
countArray(j, 1) = countArray(j, 1) + 1
found = True
Exit For
End If
Next
If (found = False) Then
countArray(n, 0) = sElement
countArray(n, 1) = 1
n = n + 1
End If
Next

endTime = Now
Debug.Print "End: " & endTime
Debug.Print "Duration: " & DateDiff("s", startTime, endTime)
Debug.Print "Unique Words: " & n
End Sub
0

LVL 86

Expert Comment

ID: 9932640
Those times were using a max word length of 2.  I increased the word length to 7 and the results are very different:

??? = couldn't wait that long

Words:      Idle_Mind   PurplePerls
100,000         33               ???
50,000          15               ???
10,000          3                 17

The more unique words there are in the collection, the more pronounced the difference in speed will be.

Idle_Mind
0

LVL 1

Author Comment

ID: 9934233
Idle - thanks, your method has worked well for me.

All others, thanks for contributions.
0

## Featured Post

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The debugging module of the VB 6 IDE can be accessed by way of the Debug menu item. That menu item can normally be found in the IDE's main menu line as shown in this picture. Â  There is also a companion Debug Toolbar that looks like the followinâ€¦
This article describes some techniques which will make your VBA or Visual Basic Classic code easier to understand and maintain, whether by you, your replacement, or another Experts-Exchange expert.
Get people started with the process of using Access VBA to control Outlook using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Microsoft Outlook. Using automation, an Access applicâ€¦
This lesson covers basic error handling code in Microsoft Excel using VBA. This is the first lesson in a 3-part series that uses code to loop through an Excel spreadsheet in VBA and then fix errors, taking advantage of error handling code. This lâ€¦
###### Suggested Courses
Course of the Month5 days, 20 hours left to enroll

#### 772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.