• Status: Solved
• Priority: Medium
• Security: Public
• Views: 508

# Array Challenge?

I'm working on another solution and need to create an array of 3MM values, stored in column A, B, and C in Excel 2007.

I can assign the range by setting three range variables to the three columns, then

rTotal = Union(rFirst,rSecond,rThird)

I'd like to convert this one 2-D array into a 1-D array, without having to iterate through every element.

Any clever ideas on how to load 3 MM records from 3 ranges, or how to convert a 2-D to a 1-D more expeditiously?

Thanks,

Dave
0
dlmille
• 10
• 5
• 4
• +2
4 Solutions

Commented:
Dave
I'm probably going to look stupid here, but what do you mean by 3MM values?
0

Commented:
Others may correct me, but I would have thought that this isn't possible, simply because Excel does not let you have access to the underlying way that arrays are stored in memory.  You could potentially write something in a language like C that does it, and compile it to a DLL which you refer to from Excel.  The approach would be to work out the total number of elements, create an array of that size, and then add the appropriate number of bytes on to the array point to be able to access the middle and third sections.

Having said that, there will always be iteration going on at some level, even if it is not within your code.  If you are finding it is slow, it might be because you are iterating a statement like myarray(i)=myrange.Cells(x,y) many times.   You would find it much faster to iterate entirely within VB without involving Excel, i.e. to do:
``````    myarray1 = ActiveSheet.Range("A1:A10").Value
myarray2 = ActiveSheet.Range("B1:B10").Value
myarray3 = ActiveSheet.Range("C1:C10").Value

ReDim finalarray(1 To UBound(myarray1) + UBound(myarray2) + UBound(myarray3), 1 To 1)

j = 1
For i = 1 To UBound(myarray1)
finalarray(j, 1) = myarray1(i, 1)
j = j + 1
Next i
For i = 1 To UBound(myarray2)
finalarray(j, 1) = myarray2(i, 1)
j = j + 1
Next i
For i = 1 To UBound(myarray3)
finalarray(j, 1) = myarray3(i, 1)
j = j + 1
Next i
ActiveSheet.Range("D1:D30").Value = finalarray
``````
0

Commented:
Sorry - seen the other question now so you obviously mean millions.  If its a case of sorting all 3 million recs, I don't see how you could get them into one array without having to visit each member.  One possibility would be to sort each of three arrays individually, then merge them - but again you have to sort then look at each item to merge, which would prob be slow in VBA. I'll consult some books for algorithms that might help...
0

Author Commented:
3MM values = 3 million values - column A, column B and column C full of values.

Jan - I tried that and fell asleep loading the first million.

Its too bad application.transpose only works on 65536 records for the variant assignment.

Dave
0

Commented:
Have you tried using a recordset based on a union query and then using GetRows?
0

Author Commented:
I'm not sure I've done anything like that.  Do you have a simple example?  I'll give it a shot.

Tks,

Dave
0

Commented:
Recordset is definitely worth a try.  Good idea, rorya.  I'd be interested to know for future reference how it comes out in a speed test versus the approach I suggested.
0

Commented:
Here's an example of the recordset approach, assuming you have three columns of data, with headings Col1, Col2 and Col3 on a sheet called Sheet1:

``````Public Sub LoadWithADODB()

Dim a As Variant

' **** You need the commented connection string for Excel 2003 or earlier
'cn.Open "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & ThisWorkbook.FullName & ";Extended Properties=""Excel 8.0;HDR=Yes;IMEX=1"";"
cn.Open "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & ThisWorkbook.FullName & ";Extended Properties=""Excel 12.0;HDR=Yes"";"

strSQL = " SELECT Col1 As Cons FROM [Sheet1\$] " & _
" UNION ALL SELECT Col2 As Cons FROM [Sheet1\$] " & _
" UNION ALL SELECT Col3 As Cons FROM [Sheet1\$] " & _
""
'"  ORDER BY Cons"
Debug.Print "Opening recordset"
rs.Open strSQL, cn

If Not rs.EOF Then
Debug.Print "getting rows"
a = rs.GetRows
End If

rs.Close
Set rs = Nothing
cn.Close
Set cn = Nothing

End Sub
``````

If you have Excel 2003 or earlier, you'll need to swap the connections strings as an the comment.

As for speed, it's slower than I expected, but I think still quicker than visiting each cell.  It loads a 2 dimensional array into 'a', in the transposed format you get with GetRows, so for 3 cols of  1 million rows each, you get an array dimensioned (0 To 0, 0 To 2999999), which looks odd, but it's relatively straightforward to change code that expects a single dimension to deal with this.

If you use the commented ORDER BY clause, you can also get it sorted directly, which may help too.
0

Author Commented:
Now, that's a plan!  Thanks rorya and andrewssd3.  I'll take a look at this...

Dave
0

Commented:
Additionally, if you remove the ALL, you get distinct records too. I ran some tests with 3 complete columns, and it was about 30minutes to get the unique array set back (On a laptop about 7 years old). Might be better if you could use an actual database for the source.
0

Commented:
Oh, one other thing - best to avoid using ADO on open workbooks due to memory leaks. For a one-off query it should be OK.
0

Author Commented:
I hear you.

I tried the largeArray assignment, but then was having problems getting the upper bound of the array dimensions.  When I "watched" it, results weren't as anticipated.  So, I tried to just output the result in 3 additional columns, but no joy - only 628 records are outputting and then EOF...

For some strange reason, I'm not seeing all the records in the dataset.  Try running the macro button and see what I mean.

Dave
sortLargeArray-r1.xlsm
0

Commented:
I've just tried both approaches.  My simple one takes 0.95 sec on 3 x 1,000,000 rows.  When I tried andrewssd's approach it takes 45 sec for the same number of rows.   This is Windows 7, Excel 2007, 2.67GHz i5 with 4GB RAM.

What do you get, dlmille?
0

Author Commented:
A couple points:

1.  Andrewssd's approach is probably the slowest, and still leaves me with a 2d array.  The same can be accomplished faster with the Union.  However, Andrewssd's approach has a SORT option - a major timesavor over sorting an array.  I'd need to time the array sort from your solution, but my money's on ADO right now.
2.  Your approach leaves me with a 1d array - pretty much brute-forcing it in - you use 3 variants versus my 1, but very similar approach.  I never really tried brute forcing the array in, as I thought it would take too much time, lol :)

time to have an array "loaded" (LENOVO THinkPad DUO T9400 2.53 Ghz w/ 4 GB ram, 64 Bit Vista Enterprise)

Andrewssd's time (sorted) = 7 secs
Jan's time (unsorted) = 1 secs

Let me see if I can generate some output with both, leveraging a sort algorithm or two on the 1-D array...  Will touch back, later.

Dave
0

Author Commented:
Jan - my original code did try to brute force, but I tried to redim on every element.  I think mine was even more brute force!  redim'ing once for the entire operation is the way to go.  Good tip, that.

Dave
0

Commented:
Yes - my results are similar on quite a fast laptop.  I hasten to say the recordset approach is not my chosen one - I just gave you and example of how to do it!  My experience of using adodb with Excel as the data source is that it generally works quite well for simple selects, but is quirky and buggy for more complex work.

I'm also interested in the question about sorting large volumes of data in Excel.  I saw the question you're looking at, which creates a 2 million row array and wants it sorted.  I'd be interested to know what will be done with the 2 million rows once they ARE sorted, as I assume he'll have to do something like what you have done to split the array up again and dump it onto a worksheet in chunks of 1 million rows.  In this case it may be better to have the array 2-d in the first place to allow it to be more easily put back into excel with one assignment statement.  If that's an option, I was looking at adapting the quicksort algorithm to sort a 2-d array where the data was organised column-wise in columns of say 1M rows each.  This is an effort at that, which I think is OK but has had minimal testing:

``````Sub ss()
Dim a As Variant
Debug.Print "Getting data"
a = [a1:c1000000].Value
Debug.Print "Sorting...."
Call QSort2D(a, 1, 3000000)

Debug.Print "Replacing"
[e1:g1000000].Value = a

End Sub

Sub QSort2D(sortArray As Variant, ByVal leftIndex As Variant, ByVal rightIndex As Variant)
Dim compValue As Variant
Dim i As Long, j As Long
Dim i1 As Long, i2 As Long
Dim j1 As Long, j2 As Long
Dim tempVar As Variant
Dim upper As Long
Dim comp1 As Long, comp2 As Long

upper = UBound(sortArray, 1)

i = leftIndex
j = rightIndex

i2 = ((i - 1) \ upper) + 1
i1 = ((i - 1) Mod upper) + 1
j2 = ((j - 1) \ upper) + 1
j1 = ((j - 1) Mod upper) + 1
comp2 = ((((i + j) \ 2) - 1) \ upper) + 1
comp1 = ((((i + j) \ 2) - 1) Mod upper) + 1

compValue = sortArray(comp1, comp2)

Do
Do While (sortArray(i1, i2) < compValue And i < rightIndex)
i = i + 1
i2 = ((i - 1) \ upper) + 1
i1 = ((i - 1) Mod upper) + 1
Loop
Do While (compValue < sortArray(j1, j2) And j > leftIndex)
j = j - 1
j2 = ((j - 1) \ upper) + 1
j1 = ((j - 1) Mod upper) + 1
Loop
If i <= j Then

tempVar = sortArray(i1, i2)
sortArray(i1, i2) = sortArray(j1, j2)
sortArray(j1, j2) = tempVar

i = i + 1
j = j - 1
i2 = ((i - 1) \ upper) + 1
i1 = ((i - 1) Mod upper) + 1
j2 = ((j - 1) \ upper) + 1
j1 = ((j - 1) Mod upper) + 1
End If
Loop While i <= j

If leftIndex < j Then QSort2D sortArray, leftIndex, j
If i < rightIndex Then QSort2D sortArray, i, rightIndex
End Sub
``````

It is exactly the same, but converts the i, j and comp values into 2-dimensional references ( so that for example i=1,000,001 converts to row 1, col 2 of the array, etc
0

Commented:
Bizarre.  I wonder why andrewssd's approach took 7s sorted for you and 45s unsorted for me on similar machines.  I tried it with sorting and it increases to 52s.

I was using Micsoft ActiveX Data Objects 6.0 Library.  Is that the same as what you are using, Dave?

Did you use andrewssd's exact code, or did you modify it in any respect?
0

Author Commented:
Andrewssd - I'd suggest a mergesort.  I'm working up 3 sorts now for testing - QuickSort, HeapSort, and MergeSort - the last may lend itself to our needs, if not the OP's :)  My demonstration will have output in the worksheet, so full cycle, lol

Jan - I posted my code - timed it to completion of a = rs.getrows

I used the MS ADO 6.0 library.

My times were off - sorry about that.  I had only 65K rows - needed to save and bring back as XL 2007.

Andrewssd's load time was:  97 secs and through sorted dump back in the worksheet:  (still can't get that to work).

Jan's load time was: 3.39 secs and through unsorted dump back in the worksheet:  10.7 Mins

Here's my worksheet.  What am I doing wrong with Andrewssd's code?  I'm getting EOF after only 2500 records

Cheers,

Dave

Cheers,

Dave
sortLargeArray-r1.xlsm
0

Author Commented:
Here's my latest - with timings.  I have as yet to do HeapSort and MergeSort, but will share this (and this link) back in the other question.

from time of start (Seconds) - random numbers in 3 columns (not unique numbers, so results may vary, of course!)

ADO Method LOAD:  126, SORT 126, OUTPUT, 744 <- intriguing, especially if unique is needed
BruteForce/QuickSort:  Load 3, SORT 44, OUTPUT 671 <- fastest with single array
Assign+Qsort2D: Load 3, SORT 80, OUTPUT 82 <- blazing with 2-d array

Surprisingly, the BruteForce approach was faster than the ADO method (V8 moment for me - I just KNEW that that would take too long!  and perhaps so, as opposed to variant assignments from ranges potentially being an option).

At any rate, I give rorya credit for the idea of using ADO (and benefits of UNIQUE), Andrewssd3 credit for giving me some ADO code to work with, plus the 2-D array with Qsort 2D is Kick-A!.  Finally, Jan delivered exactly what I asked for - a 1d array.

Andrewssd3 asked a good question - so, what are you going to do with it, once you have it?  A good question.  Let's see how this plays out in the other thread.  I've credited you in this thread for the assistance provided, thus far.

At this point, I'd award rorya 100 points, Andrewssd3 200, and Jan24 200 points.  Great collaboration.

Sound OK?

I just "wish" you could map an array structure on top of a 2-d structure with direct memory mapping... E.g.,

myArray (single dimension) = A2D_Array(1) where A2D_Array is 2-d and you just have to lock one dimension down for the assignment.

And appending arrays would also be nice, now that I'm "wishing" :)

Dave

sortLargeArray-r1.xlsm
0

Commented:
I've had a go at a hybrid sort of the column data, using excel internal sort to each column first, then merging them.  I think think this could be made more elegant, but it works, and is the fastest one I've tried:

Sorting 2.7M rows, in two cols of 1M and one of 0.7M:

QuickSort2D - ave. 40 secs
MergeCols    - ave. 15 secs sort-multi.xlsm

For some reason that I haven't been able to track down, Quicksort consistently runs in about 27 secs first time, but subsequently settles down to the 40 secs.  I don't think the VBA code leaks memory, but maybe it is something to do with internal storage, or Undo buffers.

My test file is attached (data cut down as it was 35MB).  This has the code for the three things I've submitted here.

My Microsoft wish list for arrays would be as you say - appending arrays, splitting arrays and being able to extend the first dimension in a 2D array.  Chances of MS doing this in VBA....
0

Commented:
0.000000000001 rounded down to the nearest integer...
0

Author Commented:
I chose my solution as it compiles everyone's ideas into one, distributing points as I best could.
0

Commented:
hi everyone,

I know it's case closed on this question but it stirred memories of something I glanced at a while ago - at the time most of it went straight over my head!

There is some benefit to having existing solutions and as there is a wish list, and as, *chuckle* Rorya suggests, the chances of MS incorporating the items into vba aare slim to nil. So, to save re-inventing the wheel, here are a couple of references that give some ready made code for...

- concatenating arrays,
- converting 1d to 2d or vice versa
- heaps of other stuff listed at the top of the page or the stated in attached file.

1) A page on Chip Pearson's site which may be of interest... although I think everyone in this thread can probably already code similar functions. Hopefully it is useful for people who come across this thread while searching.

http://www.cpearson.com/excel/vbaarrays.htm
For example it has code to "ConcatenateArrays".
After glancing at the code again, it seems to be built defensively or more for robustness than for pure speed. So either it wouldn't be useful, or it would need to be stripped down by removing some of the testing - provided of course that the inputs are controlled somewhere else.

2)
Alan Beban created an "array functions.xls" file a while ago ("copyright 2000") &  the only link that I can find with a quick google is to a blog file (which I didn't open). I've uploaded the version from my computer which I downloaded 3-4 years ago. These functions are also built defensively.

Rob ArrayFunctions.xls
0

Author Commented:
Thanks, Rob - handy :)
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

## Featured Post

• 10
• 5
• 4
• +2
Tackle projects and never again get stuck behind a technical roadblock.