Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Array - remove duplicates efficiently and make two-dimensional

Posted on 2003-12-12
7
Medium Priority
?
2,278 Views
Last Modified: 2008-02-01
Hi and thanks for your time.

My problem:

My project creates single dimension string arrays with up to 10,000 items.

What I need to do, and have not been able to yet with any efficiency, is to turn this into a 2 dimenionsal array that has duplicates removed and has a count next to each string of the number of duplicate items.

For example:

arr(0) eq to "dog"
arr(1) eq to "dog"
arr(2) eq to "cat"
arr(3) eq to "horse"
arr(4) eq to "horse"
arr(5) eq to "horse"
arr(6) eq to "pig"

becomes:

arr(0,0) eq to "dog"    AND arr(0,1) eq to 2
arr(1,0) eq to "cat"    AND arr(0,1) eq to 1
arr(2,0) eq to "horse"  AND arr(0,1) eq to 3
arr(3,0) eq to "pig"    AND arr(0,1) eq to 1

Hope that's clear...
0
Comment
Question by:jamesK
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
7 Comments
 
LVL 4

Expert Comment

by:Clothahump
ID: 9929553
I would suggest that you create a second, 2dim array.  

Then (pseudocode):
For each element in 1dim array
   search 2dim array element 0 for occurance
   if found
        increment element 1
   else
        append 1dim element value to 2dim array in element 0
        set element 1 =1
   end if
next

If you could sort the first array, the process will go a lot faster.
0
 
LVL 86

Accepted Solution

by:
Mike Tomlinson earned 2000 total points
ID: 9929721
This should be lightning quick.  It uses the collections ability to retrieve based upon key very quickly.  Adding and removing from a collection is very fast since it uses a linked list implementation.  My algorithm does not use a costly Redim Preserve either.

Regards,

Idle_Mind

Option Explicit

Private arr() As String
Private arr2() As Variant

Private Sub Form_Load()
    ReDim arr(0 To 6)
   
    arr(0) = "dog"
    arr(1) = "dog"
    arr(2) = "cat"
    arr(3) = "horse"
    arr(4) = "horse"
    arr(5) = "horse"
    arr(6) = "pig"
End Sub

Private Sub Command1_Click()
    On Error GoTo noSuchItem
   
    Dim itemName As Variant
    Dim itemNames As Collection
    Dim nameCount As String
    Dim nameCounts As Collection
    Dim a As Single
   
    ' build no duplicates list and counts
    Set itemNames = New Collection
    Set nameCounts = New Collection
    For a = 0 To UBound(arr)
        nameCount = nameCounts.item(arr(a)) ' attempt to get current count
        nameCounts.Remove arr(a) ' remove current count
       
        nameCount = CStr(Val(nameCount) + 1) ' increment count
        nameCounts.Add nameCount, arr(a) ' put count it back in collection
    Next a
           
    ' build 2 dimensional array
    ReDim arr2(0 To itemNames.count - 1, 0 To 1)
    a = 0
    For Each itemName In itemNames
        arr2(a, 0) = itemName
        arr2(a, 1) = Val(nameCounts.item(itemName))
        a = a + 1
    Next itemName

    For a = 0 To UBound(arr2)
        Debug.Print arr2(a, 0), arr2(a, 1)
    Next a
   
    Exit Sub
   
noSuchItem: ' no count yet
    itemNames.Add arr(a) ' add item to our list
    nameCount = "0" ' set count to 0
    nameCounts.Add nameCount, arr(a) ' add count to our collection
    Resume Next
End Sub
0
 
LVL 1

Author Comment

by:jamesK
ID: 9929964
thanks for the inputs. clothahump, i looked at that kind of methodology first, but even with a binary array search, it took way too long on larger arrays.

Idle, I also looked at using a collection, but not specifically how you are suggesting. I will give your code a go and see how it works.
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 2

Expert Comment

by:PurplePerls
ID: 9931202
Hello,

what about this approach (it is without Redim Preserve for speedup)

Option Explicit
Dim flatArray(0 To 6) As String
Dim countArray(0 To 6, 0 To 1) As Variant

Sub loadElem()
  flatArray(0) = "Dog"
  flatArray(1) = "Dog"
  flatArray(2) = "Cat"
  flatArray(3) = "Horse"
  flatArray(4) = "Horse"
  flatArray(5) = "Horse"
  flatArray(6) = "Pig"

  Call countElem
End Sub

Sub countElem()
  Dim n As Integer
  Dim i As Integer
  Dim j As Integer
  Dim msg As String
  Dim found As Integer
  Dim sElement As String
 
  n = 0
  For i = 0 To UBound(flatArray)
    sElement = flatArray(i)
    found = False
    For j = 0 To n
      If (countArray(j, 0) = sElement) Then
         countArray(j, 1) = countArray(j, 1) + 1
         found = True
         Exit For
      End If
    Next
    If (found = False) Then
      countArray(n, 0) = sElement
      countArray(n, 1) = 1
      n = n + 1
    End If
  Next
  msg = ""
  For i = 0 To n
    msg = msg & countArray(i, 1) & ": " & countArray(i, 0) & Chr(13)
  Next
  MsgBox msg
End Sub



0
 
LVL 86

Expert Comment

by:Mike Tomlinson
ID: 9932416
I "benchmarked" the two algorithms and here are the results in seconds:

Words:      Idle_Mind   PurplePerls
1,000,000       82              110
 500,000         39               54
 100,000          8                11
  50,000          4                  6
  10,000          0                  1    

Here is the program I used to benchmark:

Option Explicit

Private Const numWords As Single = 1000000

Private arr() As String
Private arr2() As Variant

Private flatArray(0 To numWords) As String
Private countArray(0 To numWords, 0 To 1) As Variant

Private startTime As Date
Private endTime As Date

Private Sub Form_Load()
    Dim a As Single
    Dim b As Byte
    Dim w As String
   
    ReDim arr(0 To numWords)
   
    Debug.Print "Building Array..."
    Randomize Timer
    For a = 0 To numWords
        w = ""
        For b = 1 To 2 ' 26^2 = 676 Unique words available
            w = w & Chr(Int((122 - 97 + 1) * Rnd + 97))
        Next b
        arr(a) = w
        flatArray(a) = w
    Next a
    Debug.Print "Done"
End Sub

Private Sub Command1_Click() ' Idle_Mind
    On Error GoTo noSuchItem
   
    startTime = Now
    Debug.Print "Start: " & startTime
   
    Dim itemName As Variant
    Dim itemNames As Collection
    Dim nameCount As String
    Dim nameCounts As Collection
    Dim a As Single
   
    ' build no duplicates list and counts
    Set itemNames = New Collection
    Set nameCounts = New Collection
    For a = 0 To UBound(arr)
        nameCount = nameCounts.Item(arr(a)) ' attempt to get current count
        nameCounts.Remove arr(a) ' remove current count
       
        nameCount = CStr(Val(nameCount) + 1) ' increment count
        nameCounts.Add nameCount, arr(a) ' put count it back in collection
    Next a
           
    ' build 2 dimensional array
    ReDim arr2(0 To itemNames.Count - 1, 0 To 1)
    a = 0
    For Each itemName In itemNames
        arr2(a, 0) = itemName
        arr2(a, 1) = Val(nameCounts.Item(itemName))
        a = a + 1
    Next itemName

    endTime = Now
    Debug.Print "End: " & endTime
    Debug.Print "Duration: " & DateDiff("s", startTime, endTime)
    Debug.Print "Unique Words: " & UBound(arr2) + 1
    Exit Sub
   
noSuchItem: ' no count yet
    itemNames.Add arr(a) ' add item to our list
    nameCount = "0" ' set count to 0
    nameCounts.Add nameCount, arr(a) ' add count to our collection
    Resume Next
End Sub

Private Sub Command2_Click() ' PurplePerls
    startTime = Now
    Debug.Print "Start: " & startTime
   
    Dim n As Single
    Dim i As Single
    Dim j As Single
    Dim found As Boolean
    Dim sElement As String
 
    n = 0
    For i = 0 To UBound(flatArray)
        sElement = flatArray(i)
        found = False
        For j = 0 To n
            If (countArray(j, 0) = sElement) Then
                countArray(j, 1) = countArray(j, 1) + 1
                found = True
                Exit For
            End If
        Next
        If (found = False) Then
            countArray(n, 0) = sElement
            countArray(n, 1) = 1
            n = n + 1
        End If
    Next
 
    endTime = Now
    Debug.Print "End: " & endTime
    Debug.Print "Duration: " & DateDiff("s", startTime, endTime)
    Debug.Print "Unique Words: " & n
End Sub
0
 
LVL 86

Expert Comment

by:Mike Tomlinson
ID: 9932640
Those times were using a max word length of 2.  I increased the word length to 7 and the results are very different:

??? = couldn't wait that long

Words:      Idle_Mind   PurplePerls
 100,000         33               ???
  50,000          15               ???
  10,000          3                 17    

The more unique words there are in the collection, the more pronounced the difference in speed will be.

Idle_Mind
0
 
LVL 1

Author Comment

by:jamesK
ID: 9934233
Idle - thanks, your method has worked well for me.

All others, thanks for contributions.
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

There are many ways to remove duplicate entries in an SQL or Access database. Most make you temporarily insert an ID field, make a temp table and copy data back and forth, and/or are slow. Here is an easy way in VB6 using ADO to remove duplicate row…
When trying to find the cause of a problem in VBA or VB6 it's often valuable to know what procedures were executed prior to the error. You can use the Call Stack for that but it is often inadequate because it may show procedures you aren't intereste…
Get people started with the process of using Access VBA to control Excel using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Excel. Using automation, an Access application can laun…
Show developers how to use a criteria form to limit the data that appears on an Access report. It is a common requirement that users can specify the criteria for a report at runtime. The easiest way to accomplish this is using a criteria form that a…
Suggested Courses

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question