Solved

Sorting an Array with over 2 million members -  Analyzing on Excel VBA

Posted on 2011-09-16
6
341 Views
Last Modified: 2012-05-12
I have a vba code which calculates over 2 million numbers and puts them into an array. I want to sort these values inside the array. As far as I can see there is no sort function in excel vba.  
I tried Qsortinplace which can be found http://www.cpearson.com/excel/SortingArrays.aspx .

But it seems it doesnt work when there are 2 million members inside the array.

I guess something can be arranged when filling the array in the first place.

What is the best way to sort the huge arrays?
0
Comment
Question by:awesomejohn19
  • 5
6 Comments
 
LVL 41

Expert Comment

by:dlmille
Comment Utility
Long datatype will support array indexes up to  2,147,483,647.  Will this suffice?

Here's a heapsort algorithm using long (I personally use QuickSort, but wanted something documented with long array indexes, so here it is - untested by me):

http://www.source-code.biz/snippets/vbasic/1.htm

If not, then it can be done with a collection and collection sort.  I can assist with this, but first await your response to the first question, above.

Cheers,

Dave

0
 
LVL 41

Expert Comment

by:dlmille
Comment Utility
Actually a variant array may be larger (can't find my reference on that).  Let me see if I can load a variant array with 3 million records using variant arrays and my quicksort algorithm...

Dave
0
 
LVL 41

Expert Comment

by:dlmille
Comment Utility
Here's a QuickSort macro I use all the time, I only changed integer to variant.  It SHOULD work, and right now I'm trying to figure out how to load an array with > 1MM records without waiting forever.

Give it and the heapsort a shot, as you're already in a position to test.  Note usage on the Qsort...

Call QSort(myArray, LBound(myArray), UBound(myArray))

Will repaint the variant array myArray in sorted order.  Its easy enough to add a boolean in the mix to determine ascending/descending and I can help with that if you like it.  right now its ascending.

Let me know if this works for you:

 
Sub QSort(sortArray As Variant, ByVal leftIndex As Integer, ByVal rightIndex As Integer)
    Dim compValue As Variant
    Dim i As Variant
    Dim j As Variant
    Dim tempVar As Variant

    i = leftIndex
    j = rightIndex
    
    compValue = sortArray(Int((i + j) / 2))

    Do
        Do While (sortArray(i) < compValue And i < rightIndex)
            i = i + 1
        Loop
        Do While (compValue < sortArray(j) And j > leftIndex)
            j = j - 1
        Loop
        If i <= j Then
        
            tempVar = sortArray(i)
            sortArray(i) = sortArray(j)
            sortArray(j) = tempVar
            
            i = i + 1
            j = j - 1
        End If
    Loop While i <= j

    If leftIndex < j Then QSort sortArray, leftIndex, j
    If i < rightIndex Then QSort sortArray, i, rightIndex
End Sub

Open in new window

0
Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 41

Expert Comment

by:dlmille
Comment Utility
My bad.  I had one type-o in my integer-> variant conversion.

Sorry for the SPAM!

Here's a QuickSort macro I use all the time, I only changed integer to variant.  It SHOULD work, and right now I'm trying to figure out how to load an array with > 1MM records without waiting forever.

Give it and the heapsort a shot, as you're already in a position to test.  Note usage on the Qsort...

Call QSort(myArray, LBound(myArray), UBound(myArray))

Will repaint the variant array myArray in sorted order.  Its easy enough to add a boolean in the mix to determine ascending/descending and I can help with that if you like it.  right now its ascending.

Let me know if this works for you:

 
Sub QSort(sortArray As Variant, ByVal leftIndex As variant, ByVal rightIndex As variant)
    Dim compValue As Variant
    Dim i As Variant
    Dim j As Variant
    Dim tempVar As Variant

    i = leftIndex
    j = rightIndex
    
    compValue = sortArray(Int((i + j) / 2))

    Do
        Do While (sortArray(i) < compValue And i < rightIndex)
            i = i + 1
        Loop
        Do While (compValue < sortArray(j) And j > leftIndex)
            j = j - 1
        Loop
        If i <= j Then
        
            tempVar = sortArray(i)
            sortArray(i) = sortArray(j)
            sortArray(j) = tempVar
            
            i = i + 1
            j = j - 1
        End If
    Loop While i <= j

    If leftIndex < j Then QSort sortArray, leftIndex, j
    If i < rightIndex Then QSort sortArray, i, rightIndex
End Sub

Open in new window

0
 
LVL 45

Expert Comment

by:Martin Liss
Comment Utility
I think that the thing to do is to not sort 2,000,000 records but rather to create the 'array' sorted as you build it. I have 'array' in single quotes because what I suggest is that you use the VBA Dictionary object instead. It is like a collection but faster. Here is a short tutorial.
0
 
LVL 41

Accepted Solution

by:
dlmille earned 500 total points
Comment Utility
There's a few approaches to pick from.

The attached workbook looks at these (credits to Andrewssd3, rorya, jan24 - as I was getting input from them on how to create a large array so I could evaluate an appropriate response).

1.  ADO Method - to populate the array using ADO, puts all the data in one dimension of the two-dimensional array.  Benefits here, include the ability to extract UNIQUE values from the dataset (3 columns of 1MM rows, each), as well as sorting incorporated in the process.

2.  BruteForce method - to populate the array with range assignments to a variant array, for 3 columns could be done with a union of the 3 ranges, or just set 3 ranges up to 3 variants, then the final variant array is loaded "brute force", element by element.  The sort approach used in the QuickSort.

3.  Qsort2d method - to populate the initial 2-D array with range assignment across all 3 columns, delivering a 2-D array with 3 columns.  Then, a Quicksort (courtesy, Andrewssd3) for 2D arrays is utilized to complete the sort.

While you may already have your array loaded, the QuickSort and/or Qsort2D might be routines that help.  I believe I gave you code for both QuickSort and HeapSort.  

QuickSort is one of the fastest (on average) sorting methods, though due to complexity, it can have issues (see http://en.wikipedia.org/wiki/Sorting_algorithm for a table of algorithms and their relative merits).  

If you still need to load your array from the workbook or other dataset, then consider the 3 approaches, above, as each has its merit.

Attached, please find these approaches in the workbook, with timestamps on execution.

Cheers to collaboration on another thread, which went well beyond ("how do you create a 1d array from 3 columns of data?") to assist in this process: http:/Q_27313095.html

I have, as yet (due to other priorities) to code and compare the HeapSort and MergeSort, but will do so even past the termination of this particular E-E post.

Let us know how your work progresses, and whether any of these options worked for you.

Cheers,

Dave
sortLargeArray-r1.xlsm
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Have you ever wanted to restrict the users input in a textbox to numbers, and while doing that make sure that they can't 'cheat' by pasting in non-numeric text? Of course you can do that with code you write yourself but it's tedious and error-prone …
How to quickly and accurately populate Word documents with Excel data, charts and images (including Automated Bookmark generation) David Miller (dlmille) Synopsis In this article you’ll learn how to use ExcelToWord! to copy data,charts, shapes …
The viewer will learn how to create two correlated normally distributed random variables in Excel, use a normal distribution to simulate the return on different levels of investment in each of the two funds over a period of ten years, and, create a …
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

6 Experts available now in Live!

Get 1:1 Help Now