The 'OpenText' Command
This article describes the process of using VBA to load a file into Excel. By recording a macro, much of the code associated with this task can be reproduced by using Excel's open dialogue, opening a text file, and using the import wizard. The default behaviour isn't necessarily desirable though, and here I explain how to improve on it.
Pro's and Cons of this Function
This function pretty much takes care of all your importing needs, allowing the user to specify any of the standard delimiters, or a custom one, and is full of special considerations such as trailing minuses, starting on a row other than 1 (to skip header information), and even altering the platform origin of the file to make sure encoding and line breaks are properly interpreted.
The major failing of this function for me has always been the way it guesses at the formatting of your columns. The default is for all columns to be formatted as 'General'. This often results in lost information. For instance, if you have a code that resembles a date, and it gets formatted as a date, all the numbers in that column will get irreversibly transformed to a Long which appear to be the date when formatted. If a column contains alphanumeric codes, but some contain no letters, they will be treated as numbers automatically. If some numbers are in the column with a lot of precision after the decimal, they may be converted such that that precision is lost.
For Example:
Type: "12:50" into excel. (used commonly used to denote ratios or mesh codes)
Result: 0.5347222
Why? Excel interprets it as a time, and changes the value to 770/1440 [ (12*60+50)/(24*60) - the number of minutes in a day)
If you import data that looks like this, the same thing will happen.
Now imagine you're a drumming team and you're importing a list of moves for each of your routines (marches)
Type: "March 1" into excel.
Result: "1-Mar"
Why? Excel thought you meant the date March 1st. If you try to format the cell back as text, or general, the value is now 39873 (the integer representing March 1st, 2009) Excel has actually changed the underlying value and you can't get it back if this was done on import.
Now imagine a great big sheet of data, from a database or something, and you don't know what the data in it looks like. All sorts of examples like this might crop up.
Convinced?
The only way to ensure, in a generic way, that no information is lost, is to format every column as Text on import. This way, every cell contains the text, as is, from the file. Any calculations done with the cells will simply convert the text to a double before making the computation anyways, if this is your intended use.
Convert all columns to text
If you've recorded a macro while importing a file, you will see a potentially large set of values after FieldInfo. This is an array of arrays excel uses to identify what the format of each column should be. Generally (x,1) means format column x as General (the default), (x, 2) means format column x as Text (the desired format if you don't want to lose any information on import), and (x,3) means convert it as a Date (risky business). There are other numbers corresponding to more complex date formats.
The problem is, when you are importing a file, sometimes you don't know exactly how many columns it will have, or you know there are so many that it would be ridiculous writing out a huge array of these values Array(Array(1,2), Array(2,2), Array(3,2), .......). Looking for a way to get around this, I discovered that excel doesn't seem to care if you specify more column formats than there are columns, and so there's a very simple solution that takes up very little code space and works on variable numbers of columns - dynamically build a very large array specifying all columns as text, and pass them to the OpenText method:
Dim varArray(1 To 16384) As Variant For i = 1 To 16384 ''Max number of columns as of excel 2007 varArray(i) = Array(i, 2)Next i Workbooks.OpenText Filename:=sFileName, _ DataType:=xlDelimited, TextQualifier:=xlDoubleQuote, Comma:=True, _ FieldInfo:=varArray
The Array size of 16384 can easily be whatever other number you feel is reasonable, the time to create this array is negligible compared to how long it takes excel to open the file. Excel won't complain if the array is bigger than the number of columns in the text file. As long as the size of the array eclipses the number of columns, they will all be formatted as text, and you no longer have to worry about your data being butchered on import.
CVS Gotcha! This works except with CSV files. A .csv file and .txt file might be identical, except for the extension, but when Excel's import function is used on a file ending in '.csv', it will ignore all the OpenText parameters and simply convert it as it wants (because apparently it knows what you want better than you do.) To get around this, you can just rename the file from .csv to .txt, either manually or programatically, for the import operation, and then name it back if you choose. The code to do so programatically in VBA <Name filename as newFilename>. You can detect whether the extension is csv using <LCase(Right$(filename, 4)) = ".csv">
Sample Implementation Here's a small workbook containing a function I use all the time. It contains buttons for loading a file, and a label where the loaded file name appears once the file is loaded. When the user clicks 'load file' it will import the file and copy the file contents into a designated sheet in the same workbook. (A variation of this could create a new sheet for the file.) This is handy when you are building tools in excel that process external files without requiring the user to open the file themselves and copy/paste the contents into a designated sheet.
You can hook the 'Load' function up to any button, the same code is used, but you can pass parameters specifying the destination sheet of the loaded file contents. You can also make up your own parameters for customizing the load function. You can also change GetOpenFilename parameters to include your own specified file formats, perform special operations on your files after you load it. For instance, validate the file, ensuring that certain columns exist. There's much room for customization.
Example-Auto-Import-Files.zip
Here's to getting around more of Excel's undesirable constructs!
Sub OpenCSVAllText() Dim FilesToOpen As Variant Dim FileCounter As Long Dim wb As Workbook Dim FilePath As String Dim NumColumns As Long Dim ColumnInfo() As Variant Dim ColumnCounter As Long Dim fso As Object Dim FilePathTxt As String FilesToOpen = Application.GetOpenFilename("CSV files (*.csv), *.csv", , "Select files to open", , True) If Not IsArray(FilesToOpen) Then MsgBox "No files selected, aborting", vbCritical, "No soup for you!" Exit Sub End If With Application .ScreenUpdating = False .DisplayAlerts = False End With Set fso = CreateObject("Scripting.FileSystemObject") For FileCounter = LBound(FilesToOpen) To UBound(FilesToOpen) FilePath = FilesToOpen(FileCounter) Set wb = Workbooks.Open(FilePath) NumColumns = Cells(1, Columns.Count).End(xlToLeft).Column wb.Close False ReDim ColumnInfo(0 To NumColumns - 1) As Variant For ColumnCounter = 1 To NumColumns ColumnInfo(ColumnCounter - 1) = Array(ColumnCounter, 2) Next FilePathTxt = Left(FilePath, Len(FilePath) - 3) & "txt" If Dir(FilePathTxt) <> "" Then Kill FilePathTxt fso.MoveFile FilePath, FilePathTxt Workbooks.OpenText Filename:=FilePathTxt, Origin:=xlWindows, StartRow:=1, DataType:=xlDelimited, _ TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=False, Tab:=False, Semicolon:=False, _ Comma:=True, Space:=False, Other:=False, FieldInfo:=ColumnInfo, TrailingMinusNumbers:=True ActiveWorkbook.SaveAs FilePath, xlCSV Kill FilePathTxt Next With Application .ScreenUpdating = True .DisplayAlerts = True End With Set fso = Nothing MsgBox "Done"End Sub
Comments (2)
Commented:
hats off for you!! it saved not my day but my week!!
Best regards
Commented:
Nice article :) BTW, you might be interested in a current question about getting Excel to open CSV files without changing the contents.
The ugly but serviceable code I came up with:
Open in new window
Basically, I:
That first step, to get the column count, allows me to define how big the array with the column definitions needs to be:
Open in new window
Cheers, and looking forward to your next article,
Patrick