Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 485
  • Last Modified:

Sorting an Array with over 2 million members - Analyzing on Excel VBA

I have a vba code which calculates over 2 million numbers and puts them into an array. I want to sort these values inside the array. As far as I can see there is no sort function in excel vba.  
I tried Qsortinplace which can be found http://www.cpearson.com/excel/SortingArrays.aspx .

But it seems it doesnt work when there are 2 million members inside the array.

I guess something can be arranged when filling the array in the first place.

What is the best way to sort the huge arrays?
0
awesomejohn19
Asked:
awesomejohn19
  • 5
1 Solution
 
dlmilleCommented:
Long datatype will support array indexes up to  2,147,483,647.  Will this suffice?

Here's a heapsort algorithm using long (I personally use QuickSort, but wanted something documented with long array indexes, so here it is - untested by me):

http://www.source-code.biz/snippets/vbasic/1.htm

If not, then it can be done with a collection and collection sort.  I can assist with this, but first await your response to the first question, above.

Cheers,

Dave

0
 
dlmilleCommented:
Actually a variant array may be larger (can't find my reference on that).  Let me see if I can load a variant array with 3 million records using variant arrays and my quicksort algorithm...

Dave
0
 
dlmilleCommented:
Here's a QuickSort macro I use all the time, I only changed integer to variant.  It SHOULD work, and right now I'm trying to figure out how to load an array with > 1MM records without waiting forever.

Give it and the heapsort a shot, as you're already in a position to test.  Note usage on the Qsort...

Call QSort(myArray, LBound(myArray), UBound(myArray))

Will repaint the variant array myArray in sorted order.  Its easy enough to add a boolean in the mix to determine ascending/descending and I can help with that if you like it.  right now its ascending.

Let me know if this works for you:

 
Sub QSort(sortArray As Variant, ByVal leftIndex As Integer, ByVal rightIndex As Integer)
    Dim compValue As Variant
    Dim i As Variant
    Dim j As Variant
    Dim tempVar As Variant

    i = leftIndex
    j = rightIndex
    
    compValue = sortArray(Int((i + j) / 2))

    Do
        Do While (sortArray(i) < compValue And i < rightIndex)
            i = i + 1
        Loop
        Do While (compValue < sortArray(j) And j > leftIndex)
            j = j - 1
        Loop
        If i <= j Then
        
            tempVar = sortArray(i)
            sortArray(i) = sortArray(j)
            sortArray(j) = tempVar
            
            i = i + 1
            j = j - 1
        End If
    Loop While i <= j

    If leftIndex < j Then QSort sortArray, leftIndex, j
    If i < rightIndex Then QSort sortArray, i, rightIndex
End Sub

Open in new window

0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
dlmilleCommented:
My bad.  I had one type-o in my integer-> variant conversion.

Sorry for the SPAM!

Here's a QuickSort macro I use all the time, I only changed integer to variant.  It SHOULD work, and right now I'm trying to figure out how to load an array with > 1MM records without waiting forever.

Give it and the heapsort a shot, as you're already in a position to test.  Note usage on the Qsort...

Call QSort(myArray, LBound(myArray), UBound(myArray))

Will repaint the variant array myArray in sorted order.  Its easy enough to add a boolean in the mix to determine ascending/descending and I can help with that if you like it.  right now its ascending.

Let me know if this works for you:

 
Sub QSort(sortArray As Variant, ByVal leftIndex As variant, ByVal rightIndex As variant)
    Dim compValue As Variant
    Dim i As Variant
    Dim j As Variant
    Dim tempVar As Variant

    i = leftIndex
    j = rightIndex
    
    compValue = sortArray(Int((i + j) / 2))

    Do
        Do While (sortArray(i) < compValue And i < rightIndex)
            i = i + 1
        Loop
        Do While (compValue < sortArray(j) And j > leftIndex)
            j = j - 1
        Loop
        If i <= j Then
        
            tempVar = sortArray(i)
            sortArray(i) = sortArray(j)
            sortArray(j) = tempVar
            
            i = i + 1
            j = j - 1
        End If
    Loop While i <= j

    If leftIndex < j Then QSort sortArray, leftIndex, j
    If i < rightIndex Then QSort sortArray, i, rightIndex
End Sub

Open in new window

0
 
Martin LissRetired ProgrammerCommented:
I think that the thing to do is to not sort 2,000,000 records but rather to create the 'array' sorted as you build it. I have 'array' in single quotes because what I suggest is that you use the VBA Dictionary object instead. It is like a collection but faster. Here is a short tutorial.
0
 
dlmilleCommented:
There's a few approaches to pick from.

The attached workbook looks at these (credits to Andrewssd3, rorya, jan24 - as I was getting input from them on how to create a large array so I could evaluate an appropriate response).

1.  ADO Method - to populate the array using ADO, puts all the data in one dimension of the two-dimensional array.  Benefits here, include the ability to extract UNIQUE values from the dataset (3 columns of 1MM rows, each), as well as sorting incorporated in the process.

2.  BruteForce method - to populate the array with range assignments to a variant array, for 3 columns could be done with a union of the 3 ranges, or just set 3 ranges up to 3 variants, then the final variant array is loaded "brute force", element by element.  The sort approach used in the QuickSort.

3.  Qsort2d method - to populate the initial 2-D array with range assignment across all 3 columns, delivering a 2-D array with 3 columns.  Then, a Quicksort (courtesy, Andrewssd3) for 2D arrays is utilized to complete the sort.

While you may already have your array loaded, the QuickSort and/or Qsort2D might be routines that help.  I believe I gave you code for both QuickSort and HeapSort.  

QuickSort is one of the fastest (on average) sorting methods, though due to complexity, it can have issues (see http://en.wikipedia.org/wiki/Sorting_algorithm for a table of algorithms and their relative merits).  

If you still need to load your array from the workbook or other dataset, then consider the 3 approaches, above, as each has its merit.

Attached, please find these approaches in the workbook, with timestamps on execution.

Cheers to collaboration on another thread, which went well beyond ("how do you create a 1d array from 3 columns of data?") to assist in this process: http:/Q_27313095.html

I have, as yet (due to other priorities) to code and compare the HeapSort and MergeSort, but will do so even past the termination of this particular E-E post.

Let us know how your work progresses, and whether any of these options worked for you.

Cheers,

Dave
sortLargeArray-r1.xlsm
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

  • 5
Tackle projects and never again get stuck behind a technical roadblock.
Join Now