One of my clients needed to import, massage, and push some data into an Excel workbook. He was comfortable with the VB classic environment and felt that he could accomplish the massaging of data once it was in an array, but he wasn't sure how to import data. Our main production application has a few data import routines that would do the trick, but I thought that this was simple enough to warrant creating a quickie application for him. I would handle the I/O and create an event where he could write his data massage code. Multiple files would need be imported, in a single or multiple passes, before the next two steps would happen. I suggested that we package the code inside an Excel environment, but he was more comfortable with the stand-alone VB environment.
This article walks through the code development process. I also share the discoveries I made that greatly improved the performance of pushing the data into the workbook.
Seeing the data made me realize the importing process would be relatively straight forward. The first line in each file was an integer with the number of actual data lines that followed. The data lines were two tab-delimited number columns (single-precision data). Example:
This was almost trivially easy. For the user interface (UI), I created a form with a drive, directory, and filelist controls. I added the usual event code for the drive and directory controls' change events to cascade the change to the next control down the logical hierarchy. I packaged the data-related events into three command buttons and added a label to display the progress of the event code.
Option ExplicitDim sngData() As SinglePrivate Sub cmdImport_Click() Dim intFN As Integer Dim lngFiles As Long Dim lngCount As Long Dim lngStart As Long Dim lngLines As Long Me.MousePointer = vbHourglass Me.Enabled = False For lngFiles = 0 To File1.ListCount - 1 If File1.Selected(lngFiles) Then intFN = FreeFile Open File1.Path & "\" & File1.List(lngFiles) For Input As intFN Input #intFN, lngLines lblStatus.Caption = "Reading " & File1.List(lngFiles) lngStart = UBound(sngData, 2) If lngStart = 1 Then lngStart = 0 ReDim Preserve sngData(1 To 2, 1 To (lngStart + lngLines)) For lngCount = (lngStart + 1) To (lngStart + lngLines) Input #intFN, sngData(1, lngCount), sngData(2, lngCount) Next Close intFN End If Next Me.MousePointer = vbDefault Me.Enabled = True lblStatus.Caption = "Finished Data Import. " & UBound(sngData, 2) & " rows in memory."End Sub
The FileListbox control is set to allow multiple file selections.
This code is written for the expected two columns. It can be easily changed to import a different number of columns in the same manner that the Export routine does.
Remember that you can only change the last dimension of an array when using the Preserve keyword
Since the array data structure is defined at the form level, multiple imports will cause the array to grow with each pass's data.
With the two sample files, this import process has sub-second performance. I only see the completed message.
3 Exporting to Excel
The client originally planned to save the results of the multiple imports to a text file and then open that file with Excel. This seemed like a big inconvenience for the user (him). Besides, I had wanted to play with this process since I'd used it as a bonus example in one of my earlier articles.
A Note about Early Binding and Late Binding When developing applications that use MS-Office automation objects that are not present in the development environment, you can define such variables using the type library object or as a general Object data type. In this case, I'm developing an application in the VB (classic) IDE, not as an Excel macro. Therefore, I needed to define Excel objects (workbooks, worksheets, ranges, etc.). The easiest and fastest development happens with early binding, which required me to add a reference to the Microsoft Excel v.r Object Library (v.r are version and release numbers for the library). Doing this enables the IDE Intellisense feature for these variables and adds a set of constants to the development environment -- really nice behavior for optimal programmer efficiency.
However, the code examples you will see in this article use late binding. I prefer the late binding advantages for my production applications.
Since the Excel automation object doesn't get instantiated until the routine, the program starts quicker.
The CreateObject("Excel.Automation") statement will work for any version of Office. If my end user has a different version of Excel, early binding will fail since the type library has a different name.
Distribution of the application is much simpler. I don't have to add instructions to my user on how to add a reference to the Excel library on their PC.
Brute Force is Sloooooow
Despite my best efforts, I couldn't get very good performance out of cell-at-a-time value assignments, as shown in this (now removed) code:
'Note: the following is a much slower way to export the data to Excel For lngCount = 1 To UBound(sngData, 2) rngOut.Offset(lngCount - 1, 0).Value = sngData(1, lngCount) rngOut.Offset(lngCount - 1, 1).Value = sngData(2, lngCount) rngOut.Offset(lngCount - 1, 2).Formula = "=A" & lngCount & "/B" & lngCount If (lngCount Mod 100) = 0 Then lblStatus.Caption = "Now at Excel row " & lngCount & " of " & UBound(sngData, 2) & " rows" DoEvents End If Next
statement. During my tests, this statement caused a 13 (type mismatch) trappable error. After a change of variable data type (to Variant) failed to correct the problem, I did a Google search for possible reasons. Turns out that you can only transfer 5461 cell items with a single Transpose function call.
My first version of the BulkLoad routine used the Transpose function (above), limiting the size of the array to meet the 5461 cell limit. I populated an array with chunks of data from the imported array.
'following statement is in a nested loop, one for each dimension DataForRange(lngCol, lngLoop) = sngData(lngCol, lngLoop + (lngChunk - 1)) 'following statement transfers the chunk of data rngChunk.Value = parmxlObj.WorksheetFunction.Transpose(DataForRange)
This reduced the run time from two minutes to under five seconds, which was great. But I wondered if there might be some more performance tuning tweaks.
A little investigation into the details of the Transpose function revealed that it did little more than swap the first and second dimensions of a two-dimension array. In this case, my (1 To 2, 1 to 2500) chunk array data was copied into a (1 to 2500, 1 To 2) array. Armed with that knowledge, I swapped the dimensions of my chunk array variable (DataForRange) and the order of the indexes used when populating it with my imported data.
By swapping the dimensions with my own code, I was able to eliminate the Transpose function. This improved the performance again:
Removed an unnecessary function call.
Removed an Excel object reference required for the transpose method.
Removed the need to pass the Excel automation object (parmxlObj) to the BulkLoad routine.
Private Sub cmdExport_Click() Dim xlObj As Object 'New Excel.Application -- only used with Excel reference Dim wkbOut As Object 'Excel.Workbook Dim wksOut As Object 'Excel.Worksheet Dim rngOut As Object 'Excel.Range Dim sngStart As Single 'forperformance measurement 'output to Excel workbook lblStatus.Caption = "Begin Excel Data Export" Set xlObj = CreateObject("Excel.Application") Set wkbOut = xlObj.Workbooks.Add Set wksOut = wkbOut.Worksheets("Sheet1") 'can skip this step Set rngOut = wksOut.Range("A1") 'by replacing with wkbOut.Worksheets("Sheet1").Range("A1") Me.MousePointer = vbHourglass Me.Enabled = False xlObj.ScreenUpdating = False xlObj.Calculation = -4135 '=xlCalculationManual sngStart = Timer BulkLoad rngOut, sngData lblStatus.Caption = "Finished Excel Data Export. (" & Timer - sngStart & " seconds)" xlObj.Calculation = -4105 '=xlCalculationAutomatic xlObj.ScreenUpdating = True xlObj.Visible = True Set rngOut = Nothing Set wksOut = Nothing Set wkbOut = Nothing Set xlObj = Nothing Me.MousePointer = vbDefault Me.Enabled = TrueEnd Sub'================================'================================'Note: this version of BulkLoad does not use the Transpose function,' but it still loads the data in chunksPrivate Sub BulkLoad(parmRange As Object, parmData() As Single) Dim DataForRange() As Single 'may need Variant in future Dim lngLoop As Long Dim lngChunk As Long Dim rngChunk As Object Dim lngCol As Long Dim lngChunkSize As Long lngChunkSize = 5000 \ UBound(parmData, 1) If UBound(parmData, 2) > lngChunkSize Then ReDim DataForRange(1 To lngChunkSize, LBound(parmData, 1) To UBound(parmData, 1)) For lngChunk = 1 To (UBound(parmData, 2) - (UBound(parmData, 2) Mod lngChunkSize)) Step lngChunkSize For lngLoop = 1 To lngChunkSize For lngCol = LBound(parmData, 1) To UBound(parmData, 1) DataForRange(lngLoop, lngCol) = sngData(lngCol, lngLoop + (lngChunk - 1)) Next Next Set rngChunk = parmRange.Range(parmRange.Offset(lngChunk - 1, 0), parmRange.Offset(lngChunk - 1 + lngChunkSize, 1)) rngChunk.Value = DataForRange lblStatus.Caption = "Now at Excel row " & lngChunk & " of " & UBound(parmData, 2) & " rows" Next 'last little chunk ReDim DataForRange(1 To (UBound(parmData, 2) Mod lngChunkSize), LBound(parmData, 1) To UBound(parmData, 1)) 'UBound(sngData, 2) - 1) For lngLoop = 1 To (UBound(parmData, 2) Mod lngChunkSize) For lngCol = LBound(parmData, 1) To UBound(parmData, 1) DataForRange(lngLoop, lngCol) = sngData(lngCol, lngLoop + (lngChunk - 1)) Next Next Set rngChunk = parmRange.Range(parmRange.Cells(lngChunk, 1), parmRange.Cells(lngChunk - 1 + (UBound(parmData, 2) Mod lngChunkSize), 2)) rngChunk.Value = DataForRange Else 'only little chunk lngChunk = 1 ReDim DataForRange(1 To UBound(parmData, 2), LBound(parmData, 1) To UBound(parmData, 1)) For lngLoop = 1 To (UBound(parmData, 2) Mod lngChunkSize) For lngCol = LBound(parmData, 1) To UBound(parmData, 1) DataForRange(lngLoop, lngCol) = sngData(lngCol, lngLoop + (lngChunk - 1)) Next Next Set rngChunk = parmRange.Range(parmRange.Cells(lngChunk, 1), parmRange.Cells(lngChunk - 1 + UBound(parmData, 2), 2)) rngChunk.Value = DataForRange End IfEnd Sub
For a better UI, prevent the user from doing anything with the form while code is running.
Present the user with progress feedback.
Some of the comments, such as " 'Excel.Workbook" and " '=xlCalculationManual" are remnants of the development process, which used early binding.
With late binding, you don't have access to the constants.
Instead of 5461, I used 5000 as the maximum transfer size.
Data transfer (chunk size) is calculated based on the number of columns in the array.
Since the customer might add some columns, I wrote the export routine with a loop instead of a hard-coded number of columns like I did with the import routine.
The client can easily pass a different array to the BulkLoad routine.
4 Code Refinements
I had trimmed about a half second off the run time by eliminating the Transpose function. Then it hit me...with the worksheet function eliminated, I was no longer bound by its 5461 cell limit. I no longer needed to transfer the data in chunks. The BulkLoadFast version of the routine performs considerably faster than both the original and the tuned versions of BulkLoad.
Private Sub BulkLoadFast(parmRange As Object, parmData() As Single) Dim DataForRange() As Single Dim lngLoop As Long Dim lngChunk As Long Dim rngChunk As Object Dim lngCol As Long Dim lngChunkSize As Long lngChunkSize = UBound(parmData, 2) ReDim DataForRange(1 To UBound(parmData, 2), LBound(parmData, 1) To UBound(parmData, 1)) For lngLoop = 1 To lngChunkSize For lngCol = LBound(parmData, 1) To UBound(parmData, 1) DataForRange(lngLoop, lngCol) = sngData(lngCol, lngLoop) Next Next Set rngChunk = parmRange.Range(parmRange.Offset(0, 0), parmRange.Offset(lngChunkSize - 1, 1)) rngChunk.Value = DataForRangeEnd Sub
Sometimes such 'throw-away' projects can prove to be good learning experiences. Don't shy away from them. Although the Transpose worksheet function may be necessary in some cases, it would probably be worth your time to code around it.
Running several tests for each export method, the average performance times to push 9002 rows into an Excel workbook from the two 4501 record sample files are:
Cell-at-a-time 118.72 secs -- OMG! this is terribly sloooowTranspose() 4.52 secs -- 5000 cell chunksBulkLoad 4.07 secs -- direct array transfer chunksBulkLoadFast 2.81 secs -- a single bulk data transfer
Given these figures, the performance times relative BulkLoadFast are:
30.96% elapsed time reduction from direct array chunks
37.83% elapsed time reduction from Transpose function chunks
97.63% elapsed time reduction from cell-at-a-time transfer (almost 60 times faster)
Even if you have a one-dimensional array, it would be worth your time to copy the data into a two-dimension array and transfer the data directly instead of using the Transpose function. By default, a one dimension array is treated like a single (Excel) row of data. You need to add a dimension to align with the rows and columns in the target worksheet.
When doing office automation, your application takes a performance 'hit' every time it does something with one of the automation object variables. It would behoove you to minimize their use when performance is a concern.
If your data source is accessed through a recordset object (database table or ISAM data driver) you should consider using the CopyFromRecordset range method to transfer the data to Excel.
If you found this article helpful, please click the Yes link to the right.
I like the article :-)
It's well written and reminded me of a blog on another site which is where I initially learned some of the techniques for writing from memory to a range and also got some clarification relating to potential issues when using Transpose. I've decided to post the links here as it may be useful for someone when they are troubleshooting...
John Walkenbach's initial blog is http://www.dailydoseofexcel.com/archives/2006/12/04/writing-to-a-range-using-vba/ and the comments following the blog include:
- Erik Eckhardt provides a link to the same KB article #177991 as you.
- Patrick O'Beirne goes into a little more detail about a limitation of the array method.
- and Fiaz Idris states that the limitation and excel's resulting behaviour have changed in excel 2010.