Solved

Why is filling cells in excel so slow when using Strings

Posted on 2010-09-02
18
653 Views
Last Modified: 2012-05-10
I use the following small simplified VBA macro to fill alot of cells with data.

The following line is the interesting one:

mArray(j, i) = i & "a" & j

if I assign a unique string as above the runtime is 5 seconds for 10000 rows and 20 seconds for 20000 rows
if I assign a constant string such as "a"  the runtime is 0.2 seconds for 10000 rows and 0.4 seconds for 20000 rows
if I assign a number (i *j) the runtime is similar to the one for constant strings


So,

1) why does it take so much longer to insert unique Strings
2) why is there an exponetial growth when intering unique Strings
3) Is there any work around possible for this problem?

Thanks in advance,

Finaris



Source Code:

Public Sub test1()
Dim t1 As Currency
Dim t2 As Currency
Dim mArray As Variant
Const rows As Long = 20000

t1 = Timer

 Application.ScreenUpdating = False
 mArray = ActiveSheet.Range("A1:L" & rows)
 'Fill your array
 For i = 1 To 12
     For j = 1 To rows
        mArray(j, i) = i & "a" & j
     Next j
 Next i
 ActiveSheet.Range("A1:L" & rows).FormulaR1C1 = mArray
 t2 = Timer
 Debug.Print "Execution completed in " & (t2 - t1) & "sec"
 
 Application.ScreenUpdating = True
End Sub
0
Comment
Question by:finaris
  • 9
  • 5
  • 3
  • +1
18 Comments
 
LVL 18

Expert Comment

by:WarCrimes
Comment Utility
Perhaps it's the concatenate that is causing the increase in time, although I don't see why.  Have you tired to concatenate static strings and test how long that takes?

myArray(j,i) = "A" & "B" & "C"
myArray(j,i) = 1 & 3
0
 

Author Comment

by:finaris
Comment Utility
The original algorithm on which the example is based reads different strings without concatenation from a file.

So the concatenation is not the problem.

The time is consumed when filling the values into the cells, which can also be seen using debug outputs using Timer
0
 
LVL 18

Expert Comment

by:WarCrimes
Comment Utility
I will add there is no reason to use a loop here, which is the most inefficient part of your code.  A range object in Excel is essentially an array, so you can simply write to it directly using your formula.

I have edited your test code below.  Run as is it takes just over 1 second.  If you uncomment the .Copy/PasteSpecial, this jumps to between 50-60 seconds.  My guess is that the PasteSpecial is using a loop to perform the replacements.

WC
Public Sub test1()

Dim t1 As Currency

Dim t2 As Currency

Dim mArray As Variant

Dim mRng As Range

Const rows As Long = 20000



t1 = Timer



 Application.ScreenUpdating = False

 Set mRng = ActiveSheet.Range("A1:L" & rows)

 mRng.Formula = "=COLUMN(" & Cells(mRng.Row, mRng.Column).Address(False, False) & ") & ""a"" & ROW(" & Cells(mRng.Row, mRng.Column).Address(False, False) & ")"

 'mRng.Copy

 'mRng.PasteSpecial xlPasteValues

t2 = Timer

 Debug.Print "Execution completed in " & (t2 - t1) & "sec"

 

 Application.ScreenUpdating = True

End Sub

Open in new window

0
 
LVL 18

Expert Comment

by:WarCrimes
Comment Utility
If you read your list of strings in as a 2-dimensional array, you should be able to write it out as well without using a loop.

Is your file a .csv?  Can you upload an example of the file being read?
0
 
LVL 18

Expert Comment

by:WarCrimes
Comment Utility
Here's a better question for you.

When you read in this file of strings, how are you storing it?
0
 

Author Comment

by:finaris
Comment Utility
@ WarCrimes: My problem is not to write coordinates to cells. I have insert strings into cells, which i cannot calculate with formulas. Code above is a simplification to show the problem.

My file is not CSV, its a proprietary format:

[FIle Header]
[section header 1]
a0[TAB]b0[TAB]c0[TAB]d0
a1[TAB]b1[TAB]c1[TAB]d1
a2[TAB]b2[TAB]c2[TAB]d2
[section header 1]
a0[TAB]b0[TAB]c0[TAB]d0
a1[TAB]b1[TAB]c1[TAB]d1
a2[TAB]b2[TAB]c2[TAB]d2
...

0
 
LVL 18

Assisted Solution

by:WarCrimes
WarCrimes earned 400 total points
Comment Utility
That is why I wasn't concerned with the filetype but how you are storing it once you read it in.  I am guessing you have read it in succesfully, right?  How are you storing it?  In a worksheet?  If so, it's already in a range object.  There should be no need to loop.

I noticed something I've never investigated before though.

Using your test code I tested writing the following values:
 mRng = "=""A"""    - This took 0.5 seconds
 mRng = "A"           ' This took 75 seconds

So it seems it doesn't have anything to do with whether the string is constant or not, but how you are writing it to the Range object.

According to this link, your test loop shouldn't really be causing large delays either, as this states it can do 100,000 records in about 0.1 seconds.

http://www.dailydoseofexcel.com/archives/2006/12/04/writing-to-a-range-using-vba/


Using your exact code, I tested the following
        mArray(j, i) = "=""a"""   - 1 second
        mArray(j, i) = "a"           - 75 seconds
        mArray(j, i) = i & "a" & j - 58 seconds
0
 
LVL 18

Expert Comment

by:WarCrimes
Comment Utility
I restarted my machine thinking there was some memory leakage going on.  Good news is that

mRng = "A" is now performing faster than
mRng = "=""A"""

as I would have expected.  Both are taking less than 0.2 seconds.

Your code is still slow for any value I try, even the

mArray(j,i) = "=""a""" which was really fast before.

Since you are going to be reading in a dataset anyways, I think it's best to leave the benchmarking for another question.  You are really here to figure out how to read your data faster, right?

If you are reading in the data successfully, then you need to read it into VBA as a Variant array.  If it's stored in a worksheet, this is easy, simply copy the Range.  When you're writing, write to a Range object.  This should reduce the overhead as much as possible.  Here is a link I think might help.  I have used this site before when researching Excel's calculation times.

http://www.decisionmodels.com/VersionCompare.htm
0
 
LVL 7

Expert Comment

by:philip m o'brien
Comment Utility
May also be worth looking at this page for ideas: http://stackoverflow.com/questions/1489533/process-large-textfiles-quickly-w-vba if you're importing the data in the first place.
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 18

Expert Comment

by:WarCrimes
Comment Utility
I've done some further testing and the loop is not the issue.  It's how you are writing the data.

In your code you have

 mArray = ActiveSheet.Range("A1:L" & rows)

This is setting the variable mArray as a Range object (which can be treated as a 2-D array).  But when you do

mArray(j,i) = <whatever value>

it actually is performing a write from VBA to Excel, which is a lot more overhead than if you were just writing to an array in VBA.

The code below is pretty efficient for constants.  It is still slow for cases like

mArray(j,i) = i & "a" & j

but I believe this is because of the overhead involved with reading the iterators.  That is all I can come up with.

To speed up your procedure, make sure you are storing the strings in a VBA array, not in the Range object.  Once you have your array populated, then write to the Range.

WC
Public Sub test2()

Dim t1 As Currency

Dim t2 As Currency

Dim t3 As Currency

Const rows As Long = 20000

Dim mArray As Variant: ReDim mArray(1 To rows, 1 To 12)



t1 = Timer



 Application.ScreenUpdating = False

 'Fill your array

 For i = 1 To 12

     For j = 1 To rows

        'mArray(j, i) = i & "A" & j

        mArray(j, i) = "A"

     Next j

 Next i

 t2 = Timer

 ActiveSheet.Range("A1:L" & rows) = mArray

 t3 = Timer

 Debug.Print "Loop completed in " & (t2 - t1) & "sec"

 Debug.Print "Write completed in " & (t3 - t2) & "sec"

 Debug.Print "Execution completed in " & (t3 - t1) & "sec"

 

 Application.ScreenUpdating = True

 'Set mArray = Nothing

End Sub

Open in new window

0
 
LVL 85

Expert Comment

by:Rory Archibald
Comment Utility
mArray is not a Range, it's a Variant, which ends up as a 2D array. Writing to the array does not write to the sheet.

@Finaris,
Which version of Excel are you using?
0
 

Author Comment

by:finaris
Comment Utility
@rorya I use Excel 2003 SP3

@ WarCrimes I think you still do not understand the problem.

-> Writing constant String values into the sheet is already FAST with my Sub
-> I read from a proprietary file format, written by another application. No workbook!
-> Reading the data is not the problem, here is NO exponetial growth, and the time spent is little in comparison to intertin the data into te sheet.


=> if I assign a unique string (i & "A" & j will give a unique string for each cell), then the runtime is 5 seconds for 10000 rows and 20 seconds for 20000 rows

The amount of cells increases by 2, the amount of required time increases by 4 ! => Exponential growth!

I need:

- A workaround for this, OR a fix of the problem
0
 
LVL 85

Accepted Solution

by:
Rory Archibald earned 100 total points
Comment Utility
I don't believe there is a workaround or fix for this, as I think it is due to the way the BIFF (the Excel file format) works. If you put the same string into multiple cells, you effectively create a shared formula record, whereby each cell simply holds a pointer to the actual formula and the formula itself is only stored once. If you enter different strings then obviously you have to store each formula in the record for each cell, which is slower.
Regards,
Rory
0
 

Author Comment

by:finaris
Comment Utility
I think i figured it out now.

The following 2 lines nearly do exactly the same, the only thing is that the formula notation is much faster, 20 times for my 20k rows example

        mArray(j, i) = i & "a" & j
        mArray(j, i) = "=" & Chr(34) & i & "a" & j & Chr(34)
0
 

Author Closing Comment

by:finaris
Comment Utility
see my last comment, figured out myself with the most rewarded hint
0
 
LVL 18

Expert Comment

by:WarCrimes
Comment Utility
Actually Rory, yes it's a Variant, but as soon as it is set equal to a Range, it's a Range, and when you assign an element of that range a value, it's written to the sheet.  This can be seen by simply removing the Application.ScreenUpdating = False.   You're point about shared formulas is right on though.  Thanks for adding that.

It's pretty obvious from my test that writing to an array in VBA is much faster than writing to a Range object which VBA has to write back to Excel.  

Anyways, I'm not really going to waste any more time on this.

Glad you are satisfied with a workable solution.

Cheers,
WC
0
 
LVL 85

Expert Comment

by:Rory Archibald
Comment Utility
But it's not 'Set' - that's the point. It's:
mArray = ActiveSheet.Range("A1:L" & rows)
and NOT:
Set mArray = ActiveSheet.Range("A1:L" & rows)


The former is equivalent to:
mArray = ActiveSheet.Range("A1:L" & rows).Value


and mArray is not a Range object.
0
 
LVL 18

Expert Comment

by:WarCrimes
Comment Utility
You are correct.  My apologies.

It's still about 3x or more slower using that line in the code, and not just the loop is slower either, but the write as well.

        mArray(j, i) = "A"  'with the line setting mArray = range
Loop completed in 0.957sec
Write completed in 0.75sec
Execution completed in 1.707sec

        mArray(j, i) = "A"  'without the line setting mArray = range
Loop completed in 0.2187sec
Write completed in 0.3086sec
Execution completed in 0.5273sec

        mArray(j, i) = i & "A" & j  'with line setting mArray = range
Loop completed in 2.9531sec
Write completed in 2.6719sec
Execution completed in 5.625sec

        mArray(j, i) = i & "A" & j 'without line setting mArray = range
Loop completed in 0.5938sec
Write completed in 0.8906sec
Execution completed in 1.4844sec

Granted, my sample size is only about 5 runs each, but all were similar.
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Suggested Solutions

Introduction This Article briefly covers methods of calculating the NPV and IRR variants in Excel as well as the limitations in calculating and interpreting IRR results. Paraphrasing Richard Shockley, author of my favourite finance reference tex…
If you need to start windows update installation remotely or as a scheduled task you will find this very helpful.
The viewer will learn how to use the =DISCRINV command to create a discrete random variable, use this command to model a set of probabilities and outcomes in a Monte Carlo simulation, and learn how to find the standard deviation of a set of probabil…
This Micro Tutorial demonstrate the bugs in Microsoft Excel for Mac with Pivot Charts.

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now