<

Beginner's Guide to Improving the Speed and reducing the Size of Excel files.

Published on
15,987 Points
4,787 Views
7 Endorsements
Last Modified:
Awarded
Community Pick
Fixing Bloated Excel Files

Often times, excel files grow to a size much larger than is necessary considering the quantity of data stored in them. Sometimes this is due to an excess of pictures, formatting, or drawing objects within the workbook, or hidden sheets that contain a lot of data, but 9 times out of 10, the problem is with dirty cells. By dirty cells, I am referring to cells that once contained data, but have since been cleared. Excel needs them to be cleared in a specific way before it will free up the associated memory.

If your files seem bloated, a very common cause is Excel thinking that some data is present in some place quite distant from the actual end of data. If your scrollbar goes way past your data (lots of scrollbar travel left, even though you are at the bottom of the data), then you need to clear the blank rows and columns beyond the range of your data.

Go to the bottom row of data, then select the entire next row by clicking on the row number. Then press the keys “CTRL” + “Shift” + “Down Arrow” to select all the remaining rows in the worksheet. Right click within your selection, and choose “Delete”. Now when you save the workbook, the scroll bar should instantly expand, showing that there is less scrolling range. You can do the same if your horizontal scroll bar is going too far. Select the first blank column, “CTRL” + “Shift”+ “Right Arrow” will take you to the last column, and then delete the selection. If you can identify several sheets with excess scrolling space such as this, and delete the unused rows and columns, you should find the file size greatly reduced.

Warning: You cannot simply select these cells and hit the "Delete" key on your keyboard. This key actually tells Excel to "ClearContents" of the selected cells. The cells may still remain dirty because of individual formatting settings, especially if the original content was copy-pasted in.


If you are familiar with Macros, you can create a macro that will automatically clear excess space in all the sheets in a workbook. All you have to do is open the VBA project manager (Alt+F11) and create a new module, then paste in the following code:
'Credit: byundt in Q_22889991
Sub ExcelDiet()
    Dim ws As Worksheet
    Dim LastRow As Long, LastCol As Long
    Application.ScreenUpdating = False
    On Error Resume Next
    For Each ws In ActiveWorkbook.Worksheets
        With ws
            LastRow = .Cells.Find(What:="*", After:=.Range("A1"), LookIn:=xlValues, _
                LookAt:=xlPart, SearchOrder:=xlByRows, SearchDirection:=xlPrevious).Row
            LastCol = .Cells.Find(What:="*", After:=.Range("A1"), LookIn:=xlValues, _
                LookAt:=xlPart, SearchOrder:=xlByColumns, SearchDirection:=xlPrevious).Column
            Range(.Cells(1, LastCol + 1), .Cells(ws.Rows.Count, ws.Columns.Count)).Delete
            Range(.Cells(LastRow + 1, 1), .Cells(ws.Rows.Count, ws.Columns.Count)).Delete
            LastRow = .UsedRange.Rows.Count 'Force excel to re-evaluate used range
        End With
    Next ws
    Application.ScreenUpdating = True
End Sub

Open in new window

Running this code and then saving the file should instantly shrink an excessively large file.


If your workbook is large because it has many rows of a repeated or iterated formula, you may want to consider conceiving of a way to delete all but a couple rows of the repeated computation when you go to save the workbook, and restore the large number or repeated calculations only when they are needed upon running the workbook.

If your workbook is large because you are storing hundreds of thousands of rows of data (i.e. using Excel as a database), you should consider at some point switching to a more appropriate infrastructure, such as using SQL Server or Microsoft Access to maintain your database.


Making Excel Run Faster

There are many, many reasons why excel might be slow to update. Most times, it is due to having a huge array of complicated formulae running each time an update is made. If you just want to speed up normal operation of your spreadsheet, you should consider the following:

You can disable automatic updating of all calculated values until you have made all your changes. In either the Office Ribbon or in Excel Options, go to the “Formulas” tab and look for the “Calculation” section. Here you can change calculation options from “Automatic” to “Manual”.  Remember to turn it back to Automatic later, because this setting will not reset automatically and applies to all Excel files, not the current one.
Certain types of formulas are processor hogs, sometimes you can find much faster alternate functions, but you should consult an excel expert for this. Using a lot of VLOOKUP, MATCH, and D-Functions are among the worst culprits for slowing down your application. Sometimes there is no other solution, but you should aim to replace them or optimize them. Make sure your lookup ranges aren't bigger than they have to be, and if possible, sort the lookup range by the value you're looking up. Also check for array forumulas that operate on very large ranges. You should also try to avoid using a lot of custom (VBA) functions within the spreadsheet, as these are usually slow to run as well and do not take advantage of parallel computing.
Unload other applications and worksheets from memory while running calculating this file. Because excel files all open in the same instance by default, a slow workbook will perform much better if there aren’t other workbooks open at the same time, taking a share of its resources.
If you are using VBA functions and/or macros it may be possible to write more efficient versions (see below)
If you have physical memory restrictions that don't allow an entire workbook to be held in memory all at once, make sure you at least have free hard disk space for paging. Writing to disk is much slower than working with a workbook in memory, so keep in mind that the a larger workbook will be directly related to the speed at which it operates.


Making Excel Macros Run Faster

You can turn off screen updating with Application.ScreenUpdating = False. This speeds up code by orders of magnitude by removing the need for excel to update the user interface with each new action that is automatically performed. This also prevents the unsightly flickering that recorded macros suffer when run. Make sure to re-enable screen updating at the end of the code.
You should disable automatic calculation for the duration of the code, and only force computation on the sheets that require it when your code needs excel to produce a result. Often times, you can disable calculation for the entire macro and just re-enable it again at the end. To do this, use Application.Calculation = xlCalculationManual. Restore it by setting to xlCalculationAutomatic. You can force calculation when it’s needed using Application.Calculate for the whole workbook, or Sheet1.Calculate if you only need to recalculate a certain sheet.
Use VBA functions instead of Excel.WorksheetFunctions. There are almost always VBA equivalents to basic excel worksheet functions and you should use them in code, as they are usually faster.
Do not use “Select” or “Selection” in your code. Always refer to objects directly. Instead of using “Range(“A1”).Select” followed by “Selection.Value = 12”, use “Range(“A1”).Value = 12”. The former is what code looks like when it’s first recorded, but it should always be adjusted. This not only improves the speed, but the reliability and readability of the macro.
This has all been general advice that applies to pretty much every Macro in excel, but there are many more specific tips available if you take the time to research them. You should search online for ways to speed up specific actions you are trying to perform, such as fast string concatenation, more efficient log-normal distributions, or fast data importing from external sources.


If, in spite of your efforts, your file is still suspiciously large or your code is running too slowly to be practical - Hey! That's what experts-exchange is for. Post a question and I'm sure we have tons of suggestions for further improvement.

--
Alain Bryden

References:
Byundt on Shrinking Excel files
ShaneDevenshire on speeding up Excel calculation

Sample VBA speed-up articles:
Optimize string handling in VB6
Microsoft Q72622: Optimizing Worksheets for Fastest Calculation
MSDN: Improving Performance in Excel 2007
Using Regular Expressions in Visual Basic for Applications
Fast Data Push to Excel
How to Split a String with Multiple Delimiters in VBA
A VBA Progress Bar for Excel
7
Comment
Author:alainbryden
  • 4
  • 3
  • 2
  • +1
10 Comments
 

Administrative Comment

by:Rory Archibald
Couple of additional things:

1. Any article about improving Excel's calculation times really ought to have a link to Charles Williams' site: www.decisionmodels.com !
2. I largely disagree with this comment:
"Using a lot of VLOOKUP, MATCH, and D-Functions are among the worst"

If you use lookups (of any sort) with sorted lists, they are in fact very fast. VLOOKUP is generally inefficient but that's because it usually involves references to far more columns of data than are relevant, and quite often involves performing the same lookup repeatedly to return data from different columns, in which case using INDEX and MATCH is much, much faster. MATCH itself is not a particularly slow function, and the D-functions are actually very fast at what they do - faster than any alternative, but usually difficult to use since you need an actual criteria range.

3. VBA functions are not *always* faster than Worksheetfunction (especially if you are actually using ranges), though they generally are.

4. Similar to 3, you cannot *always* avoid selecting things, but it is a very good rule of thumb.

3 and 4 are a bit nitpicky, to be fair.

FWIW,
Rory
0
 

Administrative Comment

by:Rory Archibald
@aikimark,

Given the context of this article, I'm not sure Dictionaries (or indeed VBA) are a practical option for the sort of person it's aimed at. I'd also be very surprised if using VBA in any form were faster than well setup native functions! :)

Rory
0
 

Administrative Comment

by:Rory Archibald
@aikimark,
We seem to be going a little off-piste here, but what the hell...
I am not talking about the Application.Worksheetfunction object - I am talking about worksheet functions. (In either case, Transpose is limited, but that's by the by.)
Also, this statement:
LastRow = .UsedRange.Rows.Count

is definitely *not* superfluous. It is a long-established trick to force Excel to re-evaluate the actual usedrange once you have finished your clean-up. Brad does not generally include pointless lines in his code! ;)

@Alain,
Since you do state that using Appliction.Worksheetfunction is slower than VBA, it is probably worth mentioning that using the straight Application equivalent (e.g. Application.Transpose vs Application.Worksheetfunction.Transpose) is generally about 20% slower, although you do get the benefit of it returning an error value rather than a RTE.

Regards,
Rory

0
Cloud Class® Course: Microsoft Exchange Server

The MCTS: Microsoft Exchange Server 2010 certification validates your skills in supporting the maintenance and administration of the Exchange servers in an enterprise environment. Learn everything you need to know with this course.

LVL 42

Expert Comment

by:dlmille
@alainbryden  - I was excited to read this article and the ExcelDiet() routine - so much so, I created an add-in to do the existing workbook or all workbooks in a folder.  To my dismay my very large models (150MB shrunk to 30-40 MB) did not function correctly afterward.  I'm digging into exactly why, now, but some of my formulas end up with !REF# as a result, so something more than needs to be deleted is being deleted.

Dave
0
LVL 42

Expert Comment

by:dlmille
I've figured it out.  If you have a formula in a cell, like SUMPRODUCT where the user has included extra columns in two different sheets - supporting potential future data to be added - then if the other sheet has columns with data beyond that range, its possible the sheets could get out of sync.  So, use of ExcelDiet should not be done blindly (unfortunately).

Great article, and I voted YES above.

Dave
0
LVL 21

Author Comment

by:alainbryden
That's a really good point. If you have formulas that refer to a really long range on another sheet for future use (A1:A99999), and the dirty cells on the target sheet are deleted to cut down the footprint of the workbook, that formula will be cut down by the number of cells deleted. I guess the workaround for this would be to:
1. Freeze all formulas by first converting them into values (programatically insert a ' at the front of every cell before the operation, and then remove it).
OR
2. Instead use full column references (A:A) - for which most formulas optimally use only non-blank cells in the target range for calculations - and these references won't get cut down by deleting excess rows.
0
LVL 42

Expert Comment

by:dlmille
Hey - I'll give that a shot!
0
LVL 47

Expert Comment

by:aikimark
I think that formula conversion might be a nice function as a stand-alone add-in, converting references to column references.  It would be nice to allow it to tolerate starting rows other than 1, since list/table data starts after the header row.
0
LVL 42

Expert Comment

by:dlmille
@aimimark - would list/table data headers be formula based?  If one converted those formulas then converted back, what would be the unintended consequence?

@alainbryden - there are a couple other unintended consequences of the code which must be handled, as well:  If there exists outlining, or active data filters that created hidden rows/columns that would have otherwise created the maximum LastRow or Last Column.

As a result, I've modified that which handles all of my "current" issues with the code, below (note:  I use x= as opposed to '= to avoid resurrecting any formulas that might be commented out for some reason:
 
Sub ExcelDiet(Optional control As IRibbonControl)
    Dim ws As Worksheet
    Dim LastRow As Long, LastCol As Long

    Application.ScreenUpdating = False
    On Error Resume Next
    'first transform all formulas so they don't get invalidated
    
    For Each ws In ActiveWorkbook.Worksheets
        ws.Cells.Replace What:="=", Replacement:="x=", LookAt:=xlPart, SearchOrder _
            :=xlByRows, MatchCase:=False, SearchFormat:=False, ReplaceFormat:=False
    Next ws
    
    For Each ws In ActiveWorkbook.Worksheets
    
        'clear any autofilters, but leave them in place
        'ws.AutoFilterMode = False
        ws.ShowAllData
        
        'unhide any rows/columns - this handles outlining/grouping as well
        ws.Cells.EntireRow.Hidden = False
        
        Application.StatusBar = "Processing File: " & ActiveWorkbook.Name & "-> Tab: " & ws.Name
        With ws
            LastRow = .Cells.Find(What:="*", after:=.Range("A1"), LookIn:=xlValues, _
                LookAt:=xlPart, SearchOrder:=xlByRows, searchdirection:=xlPrevious).Row
            LastCol = .Cells.Find(What:="*", after:=.Range("A1"), LookIn:=xlValues, _
                LookAt:=xlPart, SearchOrder:=xlByColumns, searchdirection:=xlPrevious).Column
            Range(.Cells(1, LastCol + 1), .Cells(ws.Rows.Count, ws.Columns.Count)).Delete
            Range(.Cells(LastRow + 1, 1), .Cells(ws.Rows.Count, ws.Columns.Count)).Delete
            LastRow = .UsedRange.Rows.Count 'Force excel to re-evaluate used range
        End With

    Next ws

    For Each ws In ActiveWorkbook.Worksheets
    'now, revert formulas back
        ws.Cells.Replace What:="x=", Replacement:="=", LookAt:=xlPart, SearchOrder _
            :=xlByRows, MatchCase:=False, SearchFormat:=False, ReplaceFormat:=False
            
        'and, if there was outline/grouping, reset that to 1
        ws.Outline.ShowLevels rowlevels:=1
        ws.Outline.ShowLevels columnlevels:=1
        
    Next ws

    Application.ScreenUpdating = True
End Sub

Open in new window


In my add-in, I have the option to do one sheet, the active workbook, or all files in a given folder.  On the finally successful (with the mods, above), I was able to convert 1.4GB of models to .7 GB - a reduction of about half which is great, considering some of the models were bloated to about 150 MB!

Cheers,

Dave
0
LVL 47

Expert Comment

by:aikimark
@Dave

>>what would be the unintended consequence?

I don't know.  If you have a list reference, it acts like a named range and will grow dynamically as the list grows.  Since we are looking at some replacement for use in any general formula, I don't know what would work.  Perhaps, if you had a function that returns a range, you could implement something like we do with VBA (dynamic) range calculations.
0

Featured Post

Cloud Class® Course: Certified Penetration Testing

This CPTE Certified Penetration Testing Engineer course covers everything you need to know about becoming a Certified Penetration Testing Engineer. Career Path: Professional roles include Ethical Hackers, Security Consultants, System Administrators, and Chief Security Officers.

Join & Write a Comment

Graphs within dashboards are meant to be dynamic, representing data from a period of time that will change each time the dashboard is updated with new data. Rather than update each graph to point to a different set within a static set of data, t…
This Micro Tutorial will demonstrate on a Mac how to change the sort order for chart legend values and decrpyt the intimidating chart menu.

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month