Solved

Array Challenge?

Posted on 2011-09-16
24
474 Views
Last Modified: 2012-05-12
I'm working on another solution and need to create an array of 3MM values, stored in column A, B, and C in Excel 2007.

I can assign the range by setting three range variables to the three columns, then

rTotal = Union(rFirst,rSecond,rThird)

I'd like to convert this one 2-D array into a 1-D array, without having to iterate through every element.

Any clever ideas on how to load 3 MM records from 3 ranges, or how to convert a 2-D to a 1-D more expeditiously?

Thanks,

Dave
0
Comment
Question by:dlmille
  • 10
  • 5
  • 4
  • +2
24 Comments
 
LVL 17

Expert Comment

by:andrewssd3
ID: 36553608
Dave
I'm probably going to look stupid here, but what do you mean by 3MM values?
0
 
LVL 2

Accepted Solution

by:
jan24 earned 200 total points
ID: 36553611
Others may correct me, but I would have thought that this isn't possible, simply because Excel does not let you have access to the underlying way that arrays are stored in memory.  You could potentially write something in a language like C that does it, and compile it to a DLL which you refer to from Excel.  The approach would be to work out the total number of elements, create an array of that size, and then add the appropriate number of bytes on to the array point to be able to access the middle and third sections.

Having said that, there will always be iteration going on at some level, even if it is not within your code.  If you are finding it is slow, it might be because you are iterating a statement like myarray(i)=myrange.Cells(x,y) many times.   You would find it much faster to iterate entirely within VB without involving Excel, i.e. to do:
    myarray1 = ActiveSheet.Range("A1:A10").Value
    myarray2 = ActiveSheet.Range("B1:B10").Value
    myarray3 = ActiveSheet.Range("C1:C10").Value
   
    ReDim finalarray(1 To UBound(myarray1) + UBound(myarray2) + UBound(myarray3), 1 To 1)
    
    j = 1
    For i = 1 To UBound(myarray1)
       finalarray(j, 1) = myarray1(i, 1)
       j = j + 1
    Next i
    For i = 1 To UBound(myarray2)
       finalarray(j, 1) = myarray2(i, 1)
       j = j + 1
    Next i
    For i = 1 To UBound(myarray3)
       finalarray(j, 1) = myarray3(i, 1)
       j = j + 1
    Next i
    ActiveSheet.Range("D1:D30").Value = finalarray

Open in new window

0
 
LVL 17

Expert Comment

by:andrewssd3
ID: 36553633
Sorry - seen the other question now so you obviously mean millions.  If its a case of sorting all 3 million recs, I don't see how you could get them into one array without having to visit each member.  One possibility would be to sort each of three arrays individually, then merge them - but again you have to sort then look at each item to merge, which would prob be slow in VBA. I'll consult some books for algorithms that might help...
0
 
LVL 41

Author Comment

by:dlmille
ID: 36554881
3MM values = 3 million values - column A, column B and column C full of values.

Jan - I tried that and fell asleep loading the first million.

Its too bad application.transpose only works on 65536 records for the variant assignment.

Dave
0
 
LVL 85

Assisted Solution

by:Rory Archibald
Rory Archibald earned 100 total points
ID: 36559429
Have you tried using a recordset based on a union query and then using GetRows?
0
 
LVL 41

Author Comment

by:dlmille
ID: 36561423
I'm not sure I've done anything like that.  Do you have a simple example?  I'll give it a shot.

Tks,

Dave
0
 
LVL 2

Expert Comment

by:jan24
ID: 36562268
Recordset is definitely worth a try.  Good idea, rorya.  I'd be interested to know for future reference how it comes out in a speed test versus the approach I suggested.
0
 
LVL 17

Assisted Solution

by:andrewssd3
andrewssd3 earned 200 total points
ID: 36563151
Here's an example of the recordset approach, assuming you have three columns of data, with headings Col1, Col2 and Col3 on a sheet called Sheet1:

Public Sub LoadWithADODB()

    Dim cn As ADODB.Connection, rs As ADODB.Recordset, strSQL As String
    Dim a As Variant
    
    Set cn = New ADODB.Connection
    
    ' **** You need the commented connection string for Excel 2003 or earlier
    'cn.Open "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & ThisWorkbook.FullName & ";Extended Properties=""Excel 8.0;HDR=Yes;IMEX=1"";"
    cn.Open "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & ThisWorkbook.FullName & ";Extended Properties=""Excel 12.0;HDR=Yes"";"
    
    Set rs = New ADODB.Recordset
    
    strSQL = " SELECT Col1 As Cons FROM [Sheet1$] " & _
                    " UNION ALL SELECT Col2 As Cons FROM [Sheet1$] " & _
                    " UNION ALL SELECT Col3 As Cons FROM [Sheet1$] " & _
                ""
                '"  ORDER BY Cons"
    Debug.Print "Opening recordset"
    rs.Open strSQL, cn
    
    If Not rs.EOF Then
    Debug.Print "getting rows"
        a = rs.GetRows
    End If
    
    
    rs.Close
    Set rs = Nothing
    cn.Close
    Set cn = Nothing

End Sub

Open in new window


If you have Excel 2003 or earlier, you'll need to swap the connections strings as an the comment.

As for speed, it's slower than I expected, but I think still quicker than visiting each cell.  It loads a 2 dimensional array into 'a', in the transposed format you get with GetRows, so for 3 cols of  1 million rows each, you get an array dimensioned (0 To 0, 0 To 2999999), which looks odd, but it's relatively straightforward to change code that expects a single dimension to deal with this.

If you use the commented ORDER BY clause, you can also get it sorted directly, which may help too.
0
 
LVL 41

Author Comment

by:dlmille
ID: 36563335
Now, that's a plan!  Thanks rorya and andrewssd3.  I'll take a look at this...

Dave
0
 
LVL 85

Expert Comment

by:Rory Archibald
ID: 36563533
Additionally, if you remove the ALL, you get distinct records too. I ran some tests with 3 complete columns, and it was about 30minutes to get the unique array set back (On a laptop about 7 years old). Might be better if you could use an actual database for the source.
0
 
LVL 85

Expert Comment

by:Rory Archibald
ID: 36563537
Oh, one other thing - best to avoid using ADO on open workbooks due to memory leaks. For a one-off query it should be OK.
0
 
LVL 41

Author Comment

by:dlmille
ID: 36563611
I hear you.

I tried the largeArray assignment, but then was having problems getting the upper bound of the array dimensions.  When I "watched" it, results weren't as anticipated.  So, I tried to just output the result in 3 additional columns, but no joy - only 628 records are outputting and then EOF...

For some strange reason, I'm not seeing all the records in the dataset.  Try running the macro button and see what I mean.

Dave
sortLargeArray-r1.xlsm
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 2

Expert Comment

by:jan24
ID: 36563614
I've just tried both approaches.  My simple one takes 0.95 sec on 3 x 1,000,000 rows.  When I tried andrewssd's approach it takes 45 sec for the same number of rows.   This is Windows 7, Excel 2007, 2.67GHz i5 with 4GB RAM.

What do you get, dlmille?
0
 
LVL 41

Author Comment

by:dlmille
ID: 36563733
A couple points:

1.  Andrewssd's approach is probably the slowest, and still leaves me with a 2d array.  The same can be accomplished faster with the Union.  However, Andrewssd's approach has a SORT option - a major timesavor over sorting an array.  I'd need to time the array sort from your solution, but my money's on ADO right now.
2.  Your approach leaves me with a 1d array - pretty much brute-forcing it in - you use 3 variants versus my 1, but very similar approach.  I never really tried brute forcing the array in, as I thought it would take too much time, lol :)

time to have an array "loaded" (LENOVO THinkPad DUO T9400 2.53 Ghz w/ 4 GB ram, 64 Bit Vista Enterprise)

Andrewssd's time (sorted) = 7 secs
Jan's time (unsorted) = 1 secs

Let me see if I can generate some output with both, leveraging a sort algorithm or two on the 1-D array...  Will touch back, later.

Dave
0
 
LVL 41

Author Comment

by:dlmille
ID: 36563783
Jan - my original code did try to brute force, but I tried to redim on every element.  I think mine was even more brute force!  redim'ing once for the entire operation is the way to go.  Good tip, that.

Dave
0
 
LVL 17

Assisted Solution

by:andrewssd3
andrewssd3 earned 200 total points
ID: 36563786
Yes - my results are similar on quite a fast laptop.  I hasten to say the recordset approach is not my chosen one - I just gave you and example of how to do it!  My experience of using adodb with Excel as the data source is that it generally works quite well for simple selects, but is quirky and buggy for more complex work.

I'm also interested in the question about sorting large volumes of data in Excel.  I saw the question you're looking at, which creates a 2 million row array and wants it sorted.  I'd be interested to know what will be done with the 2 million rows once they ARE sorted, as I assume he'll have to do something like what you have done to split the array up again and dump it onto a worksheet in chunks of 1 million rows.  In this case it may be better to have the array 2-d in the first place to allow it to be more easily put back into excel with one assignment statement.  If that's an option, I was looking at adapting the quicksort algorithm to sort a 2-d array where the data was organised column-wise in columns of say 1M rows each.  This is an effort at that, which I think is OK but has had minimal testing:

Sub ss()
    Dim a As Variant
    Debug.Print "Getting data"
    a = [a1:c1000000].Value
    Debug.Print "Sorting...."
    Call QSort2D(a, 1, 3000000)
    
    Debug.Print "Replacing"
    [e1:g1000000].Value = a
    
End Sub

Sub QSort2D(sortArray As Variant, ByVal leftIndex As Variant, ByVal rightIndex As Variant)
    Dim compValue As Variant
    Dim i As Long, j As Long
    Dim i1 As Long, i2 As Long
    Dim j1 As Long, j2 As Long
    Dim tempVar As Variant
    Dim upper As Long
    Dim comp1 As Long, comp2 As Long

    upper = UBound(sortArray, 1)
    
    i = leftIndex
    j = rightIndex
    
    i2 = ((i - 1) \ upper) + 1
    i1 = ((i - 1) Mod upper) + 1
    j2 = ((j - 1) \ upper) + 1
    j1 = ((j - 1) Mod upper) + 1
    comp2 = ((((i + j) \ 2) - 1) \ upper) + 1
    comp1 = ((((i + j) \ 2) - 1) Mod upper) + 1
    
    compValue = sortArray(comp1, comp2)

    Do
        Do While (sortArray(i1, i2) < compValue And i < rightIndex)
            i = i + 1
            i2 = ((i - 1) \ upper) + 1
            i1 = ((i - 1) Mod upper) + 1
        Loop
        Do While (compValue < sortArray(j1, j2) And j > leftIndex)
            j = j - 1
            j2 = ((j - 1) \ upper) + 1
            j1 = ((j - 1) Mod upper) + 1
        Loop
        If i <= j Then
        
            tempVar = sortArray(i1, i2)
            sortArray(i1, i2) = sortArray(j1, j2)
            sortArray(j1, j2) = tempVar
            
            i = i + 1
            j = j - 1
            i2 = ((i - 1) \ upper) + 1
            i1 = ((i - 1) Mod upper) + 1
            j2 = ((j - 1) \ upper) + 1
            j1 = ((j - 1) Mod upper) + 1
        End If
    Loop While i <= j

    If leftIndex < j Then QSort2D sortArray, leftIndex, j
    If i < rightIndex Then QSort2D sortArray, i, rightIndex
End Sub

Open in new window


It is exactly the same, but converts the i, j and comp values into 2-dimensional references ( so that for example i=1,000,001 converts to row 1, col 2 of the array, etc
0
 
LVL 2

Expert Comment

by:jan24
ID: 36563824
Bizarre.  I wonder why andrewssd's approach took 7s sorted for you and 45s unsorted for me on similar machines.  I tried it with sorting and it increases to 52s.

I was using Micsoft ActiveX Data Objects 6.0 Library.  Is that the same as what you are using, Dave?

Did you use andrewssd's exact code, or did you modify it in any respect?
0
 
LVL 41

Author Comment

by:dlmille
ID: 36563962
Andrewssd - I'd suggest a mergesort.  I'm working up 3 sorts now for testing - QuickSort, HeapSort, and MergeSort - the last may lend itself to our needs, if not the OP's :)  My demonstration will have output in the worksheet, so full cycle, lol

Jan - I posted my code - timed it to completion of a = rs.getrows

I used the MS ADO 6.0 library.

My times were off - sorry about that.  I had only 65K rows - needed to save and bring back as XL 2007.

Andrewssd's load time was:  97 secs and through sorted dump back in the worksheet:  (still can't get that to work).

Jan's load time was: 3.39 secs and through unsorted dump back in the worksheet:  10.7 Mins

Here's my worksheet.  What am I doing wrong with Andrewssd's code?  I'm getting EOF after only 2500 records

Cheers,

Dave

Cheers,

Dave
sortLargeArray-r1.xlsm
0
 
LVL 41

Author Comment

by:dlmille
ID: 36564433
Here's my latest - with timings.  I have as yet to do HeapSort and MergeSort, but will share this (and this link) back in the other question.

The "winner" - loading, sorting, and unloading (putting results back in the spreadsheet) appears to be:

from time of start (Seconds) - random numbers in 3 columns (not unique numbers, so results may vary, of course!)

ADO Method LOAD:  126, SORT 126, OUTPUT, 744 <- intriguing, especially if unique is needed
BruteForce/QuickSort:  Load 3, SORT 44, OUTPUT 671 <- fastest with single array
Assign+Qsort2D: Load 3, SORT 80, OUTPUT 82 <- blazing with 2-d array

Surprisingly, the BruteForce approach was faster than the ADO method (V8 moment for me - I just KNEW that that would take too long!  and perhaps so, as opposed to variant assignments from ranges potentially being an option).

At any rate, I give rorya credit for the idea of using ADO (and benefits of UNIQUE), Andrewssd3 credit for giving me some ADO code to work with, plus the 2-D array with Qsort 2D is Kick-A!.  Finally, Jan delivered exactly what I asked for - a 1d array.

Andrewssd3 asked a good question - so, what are you going to do with it, once you have it?  A good question.  Let's see how this plays out in the other thread.  I've credited you in this thread for the assistance provided, thus far.

At this point, I'd award rorya 100 points, Andrewssd3 200, and Jan24 200 points.  Great collaboration.

Sound OK?

I just "wish" you could map an array structure on top of a 2-d structure with direct memory mapping... E.g.,

myArray (single dimension) = A2D_Array(1) where A2D_Array is 2-d and you just have to lock one dimension down for the assignment.

And appending arrays would also be nice, now that I'm "wishing" :)

Dave

sortLargeArray-r1.xlsm
0
 
LVL 17

Expert Comment

by:andrewssd3
ID: 36565582
I've had a go at a hybrid sort of the column data, using excel internal sort to each column first, then merging them.  I think think this could be made more elegant, but it works, and is the fastest one I've tried:

Sorting 2.7M rows, in two cols of 1M and one of 0.7M:

QuickSort2D - ave. 40 secs
MergeCols    - ave. 15 secs sort-multi.xlsm

For some reason that I haven't been able to track down, Quicksort consistently runs in about 27 secs first time, but subsequently settles down to the 40 secs.  I don't think the VBA code leaks memory, but maybe it is something to do with internal storage, or Undo buffers.

My test file is attached (data cut down as it was 35MB).  This has the code for the three things I've submitted here.

My Microsoft wish list for arrays would be as you say - appending arrays, splitting arrays and being able to extend the first dimension in a 2D array.  Chances of MS doing this in VBA....
0
 
LVL 85

Expert Comment

by:Rory Archibald
ID: 36565644
0.000000000001 rounded down to the nearest integer...
0
 
LVL 41

Author Closing Comment

by:dlmille
ID: 36590646
I chose my solution as it compiles everyone's ideas into one, distributing points as I best could.
0
 
LVL 10

Expert Comment

by:broro183
ID: 36736902
hi everyone,

I know it's case closed on this question but it stirred memories of something I glanced at a while ago - at the time most of it went straight over my head!

There is some benefit to having existing solutions and as there is a wish list, and as, *chuckle* Rorya suggests, the chances of MS incorporating the items into vba aare slim to nil. So, to save re-inventing the wheel, here are a couple of references that give some ready made code for...

- concatenating arrays,
- converting 1d to 2d or vice versa
- heaps of other stuff listed at the top of the page or the stated in attached file.

1) A page on Chip Pearson's site which may be of interest... although I think everyone in this thread can probably already code similar functions. Hopefully it is useful for people who come across this thread while searching.

http://www.cpearson.com/excel/vbaarrays.htm
For example it has code to "ConcatenateArrays".
After glancing at the code again, it seems to be built defensively or more for robustness than for pure speed. So either it wouldn't be useful, or it would need to be stripped down by removing some of the testing - provided of course that the inputs are controlled somewhere else.

2)
Alan Beban created an "array functions.xls" file a while ago ("copyright 2000") &  the only link that I can find with a quick google is to a blog file (which I didn't open). I've uploaded the version from my computer which I downloaded 3-4 years ago. These functions are also built defensively.

Rob ArrayFunctions.xls
0
 
LVL 41

Author Comment

by:dlmille
ID: 36984147
Thanks, Rob - handy :)
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Drop Down List with Unique/Distinct Values (enhancing the Combo-Box with a few steps and a little code) David miller (dlmille) Intro Have you ever created a data validation list from a database field or spreadsheet column (e.g., Zip Codes or Co…
Introduction This Article is a follow-up to my Mappit! Addin Article (http://www.experts-exchange.com/A_2613.html), it was inspired by an email posting I made to EUSPRIG (http://www.eusprig.org/index.htm), I will briefly cover: 1) An overvie…
The viewer will learn how to use the =DISCRINV command to create a discrete random variable, use this command to model a set of probabilities and outcomes in a Monte Carlo simulation, and learn how to find the standard deviation of a set of probabil…
This Micro Tutorial will demonstrate how to create pivot charts out of a data set. I also added a drop-down menu which allows to choose from different categories in the data set and the chart will automatically update.

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now