Solved

Sorting an Array with over 2 million members -  Analyzing on Excel VBA

Posted on 2011-09-16
6
381 Views
Last Modified: 2012-05-12
I have a vba code which calculates over 2 million numbers and puts them into an array. I want to sort these values inside the array. As far as I can see there is no sort function in excel vba.  
I tried Qsortinplace which can be found http://www.cpearson.com/excel/SortingArrays.aspx .

But it seems it doesnt work when there are 2 million members inside the array.

I guess something can be arranged when filling the array in the first place.

What is the best way to sort the huge arrays?
0
Comment
Question by:awesomejohn19
  • 5
6 Comments
 
LVL 42

Expert Comment

by:dlmille
ID: 36552382
Long datatype will support array indexes up to  2,147,483,647.  Will this suffice?

Here's a heapsort algorithm using long (I personally use QuickSort, but wanted something documented with long array indexes, so here it is - untested by me):

http://www.source-code.biz/snippets/vbasic/1.htm

If not, then it can be done with a collection and collection sort.  I can assist with this, but first await your response to the first question, above.

Cheers,

Dave

0
 
LVL 42

Expert Comment

by:dlmille
ID: 36552420
Actually a variant array may be larger (can't find my reference on that).  Let me see if I can load a variant array with 3 million records using variant arrays and my quicksort algorithm...

Dave
0
 
LVL 42

Expert Comment

by:dlmille
ID: 36552698
Here's a QuickSort macro I use all the time, I only changed integer to variant.  It SHOULD work, and right now I'm trying to figure out how to load an array with > 1MM records without waiting forever.

Give it and the heapsort a shot, as you're already in a position to test.  Note usage on the Qsort...

Call QSort(myArray, LBound(myArray), UBound(myArray))

Will repaint the variant array myArray in sorted order.  Its easy enough to add a boolean in the mix to determine ascending/descending and I can help with that if you like it.  right now its ascending.

Let me know if this works for you:

 
Sub QSort(sortArray As Variant, ByVal leftIndex As Integer, ByVal rightIndex As Integer)
    Dim compValue As Variant
    Dim i As Variant
    Dim j As Variant
    Dim tempVar As Variant

    i = leftIndex
    j = rightIndex
    
    compValue = sortArray(Int((i + j) / 2))

    Do
        Do While (sortArray(i) < compValue And i < rightIndex)
            i = i + 1
        Loop
        Do While (compValue < sortArray(j) And j > leftIndex)
            j = j - 1
        Loop
        If i <= j Then
        
            tempVar = sortArray(i)
            sortArray(i) = sortArray(j)
            sortArray(j) = tempVar
            
            i = i + 1
            j = j - 1
        End If
    Loop While i <= j

    If leftIndex < j Then QSort sortArray, leftIndex, j
    If i < rightIndex Then QSort sortArray, i, rightIndex
End Sub

Open in new window

0
Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
LVL 42

Expert Comment

by:dlmille
ID: 36552700
My bad.  I had one type-o in my integer-> variant conversion.

Sorry for the SPAM!

Here's a QuickSort macro I use all the time, I only changed integer to variant.  It SHOULD work, and right now I'm trying to figure out how to load an array with > 1MM records without waiting forever.

Give it and the heapsort a shot, as you're already in a position to test.  Note usage on the Qsort...

Call QSort(myArray, LBound(myArray), UBound(myArray))

Will repaint the variant array myArray in sorted order.  Its easy enough to add a boolean in the mix to determine ascending/descending and I can help with that if you like it.  right now its ascending.

Let me know if this works for you:

 
Sub QSort(sortArray As Variant, ByVal leftIndex As variant, ByVal rightIndex As variant)
    Dim compValue As Variant
    Dim i As Variant
    Dim j As Variant
    Dim tempVar As Variant

    i = leftIndex
    j = rightIndex
    
    compValue = sortArray(Int((i + j) / 2))

    Do
        Do While (sortArray(i) < compValue And i < rightIndex)
            i = i + 1
        Loop
        Do While (compValue < sortArray(j) And j > leftIndex)
            j = j - 1
        Loop
        If i <= j Then
        
            tempVar = sortArray(i)
            sortArray(i) = sortArray(j)
            sortArray(j) = tempVar
            
            i = i + 1
            j = j - 1
        End If
    Loop While i <= j

    If leftIndex < j Then QSort sortArray, leftIndex, j
    If i < rightIndex Then QSort sortArray, i, rightIndex
End Sub

Open in new window

0
 
LVL 46

Expert Comment

by:Martin Liss
ID: 36561758
I think that the thing to do is to not sort 2,000,000 records but rather to create the 'array' sorted as you build it. I have 'array' in single quotes because what I suggest is that you use the VBA Dictionary object instead. It is like a collection but faster. Here is a short tutorial.
0
 
LVL 42

Accepted Solution

by:
dlmille earned 500 total points
ID: 36564436
There's a few approaches to pick from.

The attached workbook looks at these (credits to Andrewssd3, rorya, jan24 - as I was getting input from them on how to create a large array so I could evaluate an appropriate response).

1.  ADO Method - to populate the array using ADO, puts all the data in one dimension of the two-dimensional array.  Benefits here, include the ability to extract UNIQUE values from the dataset (3 columns of 1MM rows, each), as well as sorting incorporated in the process.

2.  BruteForce method - to populate the array with range assignments to a variant array, for 3 columns could be done with a union of the 3 ranges, or just set 3 ranges up to 3 variants, then the final variant array is loaded "brute force", element by element.  The sort approach used in the QuickSort.

3.  Qsort2d method - to populate the initial 2-D array with range assignment across all 3 columns, delivering a 2-D array with 3 columns.  Then, a Quicksort (courtesy, Andrewssd3) for 2D arrays is utilized to complete the sort.

While you may already have your array loaded, the QuickSort and/or Qsort2D might be routines that help.  I believe I gave you code for both QuickSort and HeapSort.  

QuickSort is one of the fastest (on average) sorting methods, though due to complexity, it can have issues (see http://en.wikipedia.org/wiki/Sorting_algorithm for a table of algorithms and their relative merits).  

If you still need to load your array from the workbook or other dataset, then consider the 3 approaches, above, as each has its merit.

Attached, please find these approaches in the workbook, with timestamps on execution.

Cheers to collaboration on another thread, which went well beyond ("how do you create a 1d array from 3 columns of data?") to assist in this process: http:/Q_27313095.html

I have, as yet (due to other priorities) to code and compare the HeapSort and MergeSort, but will do so even past the termination of this particular E-E post.

Let us know how your work progresses, and whether any of these options worked for you.

Cheers,

Dave
sortLargeArray-r1.xlsm
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Workbook link problems after copying tabs to a new workbook? David Miller (dlmille) Intro Have you either copied sheets to a new workbook, and after having saved and opened that workbook, you find that there are links back to the original sou…
This code takes an Excel list of URL’s and adds a header titled “URL List”. It then searches through all URL’s in column “A”, looking for duplicates. When a duplicate is found, it is moved to the top of the list. The duplicate URL’s are then highlig…
This Micro Tutorial will demonstrate on a Mac how to change the sort order for chart legend values and decrpyt the intimidating chart menu.
This Micro Tutorial will demonstrate how to create pivot charts out of a data set. I also added a drop-down menu which allows to choose from different categories in the data set and the chart will automatically update.

828 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question