Array - remove duplicates efficiently and make two-dimensional

Hi and thanks for your time.

My problem:

My project creates single dimension string arrays with up to 10,000 items.

What I need to do, and have not been able to yet with any efficiency, is to turn this into a 2 dimenionsal array that has duplicates removed and has a count next to each string of the number of duplicate items.

For example:

arr(0) eq to "dog"
arr(1) eq to "dog"
arr(2) eq to "cat"
arr(3) eq to "horse"
arr(4) eq to "horse"
arr(5) eq to "horse"
arr(6) eq to "pig"

becomes:

arr(0,0) eq to "dog"    AND arr(0,1) eq to 2
arr(1,0) eq to "cat"    AND arr(0,1) eq to 1
arr(2,0) eq to "horse"  AND arr(0,1) eq to 3
arr(3,0) eq to "pig"    AND arr(0,1) eq to 1

Hope that's clear...
LVL 1
jamesKAsked:
Who is Participating?
 
Mike TomlinsonConnect With a Mentor Middle School Assistant TeacherCommented:
This should be lightning quick.  It uses the collections ability to retrieve based upon key very quickly.  Adding and removing from a collection is very fast since it uses a linked list implementation.  My algorithm does not use a costly Redim Preserve either.

Regards,

Idle_Mind

Option Explicit

Private arr() As String
Private arr2() As Variant

Private Sub Form_Load()
    ReDim arr(0 To 6)
   
    arr(0) = "dog"
    arr(1) = "dog"
    arr(2) = "cat"
    arr(3) = "horse"
    arr(4) = "horse"
    arr(5) = "horse"
    arr(6) = "pig"
End Sub

Private Sub Command1_Click()
    On Error GoTo noSuchItem
   
    Dim itemName As Variant
    Dim itemNames As Collection
    Dim nameCount As String
    Dim nameCounts As Collection
    Dim a As Single
   
    ' build no duplicates list and counts
    Set itemNames = New Collection
    Set nameCounts = New Collection
    For a = 0 To UBound(arr)
        nameCount = nameCounts.item(arr(a)) ' attempt to get current count
        nameCounts.Remove arr(a) ' remove current count
       
        nameCount = CStr(Val(nameCount) + 1) ' increment count
        nameCounts.Add nameCount, arr(a) ' put count it back in collection
    Next a
           
    ' build 2 dimensional array
    ReDim arr2(0 To itemNames.count - 1, 0 To 1)
    a = 0
    For Each itemName In itemNames
        arr2(a, 0) = itemName
        arr2(a, 1) = Val(nameCounts.item(itemName))
        a = a + 1
    Next itemName

    For a = 0 To UBound(arr2)
        Debug.Print arr2(a, 0), arr2(a, 1)
    Next a
   
    Exit Sub
   
noSuchItem: ' no count yet
    itemNames.Add arr(a) ' add item to our list
    nameCount = "0" ' set count to 0
    nameCounts.Add nameCount, arr(a) ' add count to our collection
    Resume Next
End Sub
0
 
ClothahumpCommented:
I would suggest that you create a second, 2dim array.  

Then (pseudocode):
For each element in 1dim array
   search 2dim array element 0 for occurance
   if found
        increment element 1
   else
        append 1dim element value to 2dim array in element 0
        set element 1 =1
   end if
next

If you could sort the first array, the process will go a lot faster.
0
 
jamesKAuthor Commented:
thanks for the inputs. clothahump, i looked at that kind of methodology first, but even with a binary array search, it took way too long on larger arrays.

Idle, I also looked at using a collection, but not specifically how you are suggesting. I will give your code a go and see how it works.
0
Cloud Class® Course: Python 3 Fundamentals

This course will teach participants about installing and configuring Python, syntax, importing, statements, types, strings, booleans, files, lists, tuples, comprehensions, functions, and classes.

 
PurplePerlsCommented:
Hello,

what about this approach (it is without Redim Preserve for speedup)

Option Explicit
Dim flatArray(0 To 6) As String
Dim countArray(0 To 6, 0 To 1) As Variant

Sub loadElem()
  flatArray(0) = "Dog"
  flatArray(1) = "Dog"
  flatArray(2) = "Cat"
  flatArray(3) = "Horse"
  flatArray(4) = "Horse"
  flatArray(5) = "Horse"
  flatArray(6) = "Pig"

  Call countElem
End Sub

Sub countElem()
  Dim n As Integer
  Dim i As Integer
  Dim j As Integer
  Dim msg As String
  Dim found As Integer
  Dim sElement As String
 
  n = 0
  For i = 0 To UBound(flatArray)
    sElement = flatArray(i)
    found = False
    For j = 0 To n
      If (countArray(j, 0) = sElement) Then
         countArray(j, 1) = countArray(j, 1) + 1
         found = True
         Exit For
      End If
    Next
    If (found = False) Then
      countArray(n, 0) = sElement
      countArray(n, 1) = 1
      n = n + 1
    End If
  Next
  msg = ""
  For i = 0 To n
    msg = msg & countArray(i, 1) & ": " & countArray(i, 0) & Chr(13)
  Next
  MsgBox msg
End Sub



0
 
Mike TomlinsonMiddle School Assistant TeacherCommented:
I "benchmarked" the two algorithms and here are the results in seconds:

Words:      Idle_Mind   PurplePerls
1,000,000       82              110
 500,000         39               54
 100,000          8                11
  50,000          4                  6
  10,000          0                  1    

Here is the program I used to benchmark:

Option Explicit

Private Const numWords As Single = 1000000

Private arr() As String
Private arr2() As Variant

Private flatArray(0 To numWords) As String
Private countArray(0 To numWords, 0 To 1) As Variant

Private startTime As Date
Private endTime As Date

Private Sub Form_Load()
    Dim a As Single
    Dim b As Byte
    Dim w As String
   
    ReDim arr(0 To numWords)
   
    Debug.Print "Building Array..."
    Randomize Timer
    For a = 0 To numWords
        w = ""
        For b = 1 To 2 ' 26^2 = 676 Unique words available
            w = w & Chr(Int((122 - 97 + 1) * Rnd + 97))
        Next b
        arr(a) = w
        flatArray(a) = w
    Next a
    Debug.Print "Done"
End Sub

Private Sub Command1_Click() ' Idle_Mind
    On Error GoTo noSuchItem
   
    startTime = Now
    Debug.Print "Start: " & startTime
   
    Dim itemName As Variant
    Dim itemNames As Collection
    Dim nameCount As String
    Dim nameCounts As Collection
    Dim a As Single
   
    ' build no duplicates list and counts
    Set itemNames = New Collection
    Set nameCounts = New Collection
    For a = 0 To UBound(arr)
        nameCount = nameCounts.Item(arr(a)) ' attempt to get current count
        nameCounts.Remove arr(a) ' remove current count
       
        nameCount = CStr(Val(nameCount) + 1) ' increment count
        nameCounts.Add nameCount, arr(a) ' put count it back in collection
    Next a
           
    ' build 2 dimensional array
    ReDim arr2(0 To itemNames.Count - 1, 0 To 1)
    a = 0
    For Each itemName In itemNames
        arr2(a, 0) = itemName
        arr2(a, 1) = Val(nameCounts.Item(itemName))
        a = a + 1
    Next itemName

    endTime = Now
    Debug.Print "End: " & endTime
    Debug.Print "Duration: " & DateDiff("s", startTime, endTime)
    Debug.Print "Unique Words: " & UBound(arr2) + 1
    Exit Sub
   
noSuchItem: ' no count yet
    itemNames.Add arr(a) ' add item to our list
    nameCount = "0" ' set count to 0
    nameCounts.Add nameCount, arr(a) ' add count to our collection
    Resume Next
End Sub

Private Sub Command2_Click() ' PurplePerls
    startTime = Now
    Debug.Print "Start: " & startTime
   
    Dim n As Single
    Dim i As Single
    Dim j As Single
    Dim found As Boolean
    Dim sElement As String
 
    n = 0
    For i = 0 To UBound(flatArray)
        sElement = flatArray(i)
        found = False
        For j = 0 To n
            If (countArray(j, 0) = sElement) Then
                countArray(j, 1) = countArray(j, 1) + 1
                found = True
                Exit For
            End If
        Next
        If (found = False) Then
            countArray(n, 0) = sElement
            countArray(n, 1) = 1
            n = n + 1
        End If
    Next
 
    endTime = Now
    Debug.Print "End: " & endTime
    Debug.Print "Duration: " & DateDiff("s", startTime, endTime)
    Debug.Print "Unique Words: " & n
End Sub
0
 
Mike TomlinsonMiddle School Assistant TeacherCommented:
Those times were using a max word length of 2.  I increased the word length to 7 and the results are very different:

??? = couldn't wait that long

Words:      Idle_Mind   PurplePerls
 100,000         33               ???
  50,000          15               ???
  10,000          3                 17    

The more unique words there are in the collection, the more pronounced the difference in speed will be.

Idle_Mind
0
 
jamesKAuthor Commented:
Idle - thanks, your method has worked well for me.

All others, thanks for contributions.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.