Solved

Sorting an Array with over 2 million members -  Analyzing on Excel VBA

Posted on 2011-09-16
6
349 Views
Last Modified: 2012-05-12
I have a vba code which calculates over 2 million numbers and puts them into an array. I want to sort these values inside the array. As far as I can see there is no sort function in excel vba.  
I tried Qsortinplace which can be found http://www.cpearson.com/excel/SortingArrays.aspx .

But it seems it doesnt work when there are 2 million members inside the array.

I guess something can be arranged when filling the array in the first place.

What is the best way to sort the huge arrays?
0
Comment
Question by:awesomejohn19
  • 5
6 Comments
 
LVL 41

Expert Comment

by:dlmille
ID: 36552382
Long datatype will support array indexes up to  2,147,483,647.  Will this suffice?

Here's a heapsort algorithm using long (I personally use QuickSort, but wanted something documented with long array indexes, so here it is - untested by me):

http://www.source-code.biz/snippets/vbasic/1.htm

If not, then it can be done with a collection and collection sort.  I can assist with this, but first await your response to the first question, above.

Cheers,

Dave

0
 
LVL 41

Expert Comment

by:dlmille
ID: 36552420
Actually a variant array may be larger (can't find my reference on that).  Let me see if I can load a variant array with 3 million records using variant arrays and my quicksort algorithm...

Dave
0
 
LVL 41

Expert Comment

by:dlmille
ID: 36552698
Here's a QuickSort macro I use all the time, I only changed integer to variant.  It SHOULD work, and right now I'm trying to figure out how to load an array with > 1MM records without waiting forever.

Give it and the heapsort a shot, as you're already in a position to test.  Note usage on the Qsort...

Call QSort(myArray, LBound(myArray), UBound(myArray))

Will repaint the variant array myArray in sorted order.  Its easy enough to add a boolean in the mix to determine ascending/descending and I can help with that if you like it.  right now its ascending.

Let me know if this works for you:

 
Sub QSort(sortArray As Variant, ByVal leftIndex As Integer, ByVal rightIndex As Integer)
    Dim compValue As Variant
    Dim i As Variant
    Dim j As Variant
    Dim tempVar As Variant

    i = leftIndex
    j = rightIndex
    
    compValue = sortArray(Int((i + j) / 2))

    Do
        Do While (sortArray(i) < compValue And i < rightIndex)
            i = i + 1
        Loop
        Do While (compValue < sortArray(j) And j > leftIndex)
            j = j - 1
        Loop
        If i <= j Then
        
            tempVar = sortArray(i)
            sortArray(i) = sortArray(j)
            sortArray(j) = tempVar
            
            i = i + 1
            j = j - 1
        End If
    Loop While i <= j

    If leftIndex < j Then QSort sortArray, leftIndex, j
    If i < rightIndex Then QSort sortArray, i, rightIndex
End Sub

Open in new window

0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 41

Expert Comment

by:dlmille
ID: 36552700
My bad.  I had one type-o in my integer-> variant conversion.

Sorry for the SPAM!

Here's a QuickSort macro I use all the time, I only changed integer to variant.  It SHOULD work, and right now I'm trying to figure out how to load an array with > 1MM records without waiting forever.

Give it and the heapsort a shot, as you're already in a position to test.  Note usage on the Qsort...

Call QSort(myArray, LBound(myArray), UBound(myArray))

Will repaint the variant array myArray in sorted order.  Its easy enough to add a boolean in the mix to determine ascending/descending and I can help with that if you like it.  right now its ascending.

Let me know if this works for you:

 
Sub QSort(sortArray As Variant, ByVal leftIndex As variant, ByVal rightIndex As variant)
    Dim compValue As Variant
    Dim i As Variant
    Dim j As Variant
    Dim tempVar As Variant

    i = leftIndex
    j = rightIndex
    
    compValue = sortArray(Int((i + j) / 2))

    Do
        Do While (sortArray(i) < compValue And i < rightIndex)
            i = i + 1
        Loop
        Do While (compValue < sortArray(j) And j > leftIndex)
            j = j - 1
        Loop
        If i <= j Then
        
            tempVar = sortArray(i)
            sortArray(i) = sortArray(j)
            sortArray(j) = tempVar
            
            i = i + 1
            j = j - 1
        End If
    Loop While i <= j

    If leftIndex < j Then QSort sortArray, leftIndex, j
    If i < rightIndex Then QSort sortArray, i, rightIndex
End Sub

Open in new window

0
 
LVL 46

Expert Comment

by:Martin Liss
ID: 36561758
I think that the thing to do is to not sort 2,000,000 records but rather to create the 'array' sorted as you build it. I have 'array' in single quotes because what I suggest is that you use the VBA Dictionary object instead. It is like a collection but faster. Here is a short tutorial.
0
 
LVL 41

Accepted Solution

by:
dlmille earned 500 total points
ID: 36564436
There's a few approaches to pick from.

The attached workbook looks at these (credits to Andrewssd3, rorya, jan24 - as I was getting input from them on how to create a large array so I could evaluate an appropriate response).

1.  ADO Method - to populate the array using ADO, puts all the data in one dimension of the two-dimensional array.  Benefits here, include the ability to extract UNIQUE values from the dataset (3 columns of 1MM rows, each), as well as sorting incorporated in the process.

2.  BruteForce method - to populate the array with range assignments to a variant array, for 3 columns could be done with a union of the 3 ranges, or just set 3 ranges up to 3 variants, then the final variant array is loaded "brute force", element by element.  The sort approach used in the QuickSort.

3.  Qsort2d method - to populate the initial 2-D array with range assignment across all 3 columns, delivering a 2-D array with 3 columns.  Then, a Quicksort (courtesy, Andrewssd3) for 2D arrays is utilized to complete the sort.

While you may already have your array loaded, the QuickSort and/or Qsort2D might be routines that help.  I believe I gave you code for both QuickSort and HeapSort.  

QuickSort is one of the fastest (on average) sorting methods, though due to complexity, it can have issues (see http://en.wikipedia.org/wiki/Sorting_algorithm for a table of algorithms and their relative merits).  

If you still need to load your array from the workbook or other dataset, then consider the 3 approaches, above, as each has its merit.

Attached, please find these approaches in the workbook, with timestamps on execution.

Cheers to collaboration on another thread, which went well beyond ("how do you create a 1d array from 3 columns of data?") to assist in this process: http:/Q_27313095.html

I have, as yet (due to other priorities) to code and compare the HeapSort and MergeSort, but will do so even past the termination of this particular E-E post.

Let us know how your work progresses, and whether any of these options worked for you.

Cheers,

Dave
sortLargeArray-r1.xlsm
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Background What I'm presenting in this article is the result of 2 conditions in my work area: We have a SQL Server production environment but no development or test environment; andWe have an MS Access front end using tables in SQL Server but we a…
This article will guide you to convert a grid from a picture into Excel format using Microsoft OneNote and no other 3rd party application.
This Micro Tutorial will demonstrate on a Mac how to change the sort order for chart legend values and decrpyt the intimidating chart menu.
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…

911 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now