Community Pick: Many members of our community have endorsed this article.
Editor's Choice: This article has been selected by our editors as an exceptional contribution.

Get Excel OpenText to Import Exact File Contents

Published:
The 'OpenText' Command
This article describes the process of using VBA to load a file into Excel. By recording a macro, much of the code associated with this task can be reproduced by using Excel's open dialogue, opening a text file, and using the import wizard. The default behaviour isn't necessarily desirable though, and here I explain how to improve on it.

The Worbook.OpenText function is used to programatically import a fixed width or delimited file into excel. The 2003 and 2007 msdn reference articles on this function are here:
http://msdn.microsoft.com/en-us/library/aa195814%28office.11%29.aspx
http://msdn.microsoft.com/en-us/library/bb223513.aspx


Pro's and Cons of this Function
This function pretty much takes care of all your importing needs, allowing the user to specify any of the standard delimiters, or a custom one, and is full of special considerations such as trailing minuses, starting on a row other than 1 (to skip header information), and even altering the platform origin of the file to make sure encoding and line breaks are properly interpreted.

The major failing of this function for me has always been the way it guesses at the formatting of your columns. The default is for all columns to be formatted as 'General'. This often results in lost information. For instance, if you have a code that resembles a date, and it gets formatted as a date, all the numbers in that column will get irreversibly transformed to a Long which appear to be the date when formatted. If a column contains alphanumeric codes, but some contain no letters, they will be treated as numbers automatically. If some numbers are in the column with a lot of precision after the decimal, they may be converted such that that precision is lost.

For Example:
Type: "12:50" into excel. (used commonly used to denote ratios or mesh codes)
Result: 0.5347222
Why? Excel interprets it as a time, and changes the value to 770/1440 [ (12*60+50)/(24*60) - the number of minutes in a day)
If you import data that looks like this, the same thing will happen.

Now imagine you're a drumming team and you're importing a list of moves for each of your routines (marches)
Type: "March 1" into excel.
Result: "1-Mar"
Why? Excel thought you meant the date March 1st. If you try to format the cell back as text, or general, the value is now 39873 (the integer representing March 1st, 2009) Excel has actually changed the underlying value and you can't get it back if this was done on import.

Now imagine a great big sheet of data, from a database or something, and you don't know what the data in it looks like. All sorts of examples like this might crop up.

Convinced?

The only way to ensure, in a generic way, that no information is lost, is to format every column as Text on import. This way, every cell contains the text, as is, from the file. Any calculations done with the cells will simply convert the text to a double before making the computation anyways, if this is your intended use.


Convert all columns to text
If you've recorded a macro while importing a file, you will see a potentially large set of values after FieldInfo. This is an array of arrays excel uses to identify what the format of each column should be. Generally (x,1) means format column x as General (the default), (x, 2) means format column x as Text (the desired format if you don't want to lose any information on import), and (x,3) means convert it as a Date (risky business). There are other numbers corresponding to more complex date formats.

The problem is, when you are importing a file, sometimes you don't know exactly how many columns it will have, or you know there are so many that it would be ridiculous writing out a huge array of these values Array(Array(1,2), Array(2,2), Array(3,2), .......). Looking for a way to get around this, I discovered that excel doesn't seem to care if you specify more column formats than there are columns, and so there's a very simple solution that takes up very little code space and works on variable numbers of columns - dynamically build a very large array specifying all columns as text, and pass them to the OpenText method:
Dim varArray(1 To 16384) As Variant 
                      For i = 1 To 16384 ''Max number of columns as of excel 2007
                         varArray(i) = Array(i, 2)
                      Next i 
                      Workbooks.OpenText Filename:=sFileName, _
                         DataType:=xlDelimited, TextQualifier:=xlDoubleQuote, Comma:=True, _
                         FieldInfo:=varArray

Open in new window

The Array size of 16384 can easily be whatever other number you feel is reasonable, the time to create this array is negligible compared to how long it takes excel to open the file. Excel won't complain if the array is bigger than the number of columns in the text file. As long as the size of the array eclipses the number of columns, they will all be formatted as text, and you no longer have to worry about your data being butchered on import.


CVS Gotcha!
This works except with CSV files. A .csv file and .txt file might be identical, except for the extension, but when Excel's import function is used on a file ending in '.csv', it will ignore all the OpenText parameters and simply convert it as it wants (because apparently it knows what you want better than you do.) To get around this, you can just rename the file from .csv to .txt, either manually or programatically, for the import operation, and then name it back if you choose. The code to do so programatically in VBA <Name filename as newFilename>. You can detect whether the extension is csv using <LCase(Right$(filename, 4)) = ".csv">


Sample Implementation
Interface to Load Files DynamicallyHere's a small workbook containing a function I use all the time. It contains buttons for loading a file, and a label where the loaded file name appears once the file is loaded. When the user clicks 'load file' it will import the file and copy the file contents into a designated sheet in the same workbook. (A variation of this could create a new sheet for the file.) This is handy when you are building tools in excel that process external files without requiring the user to open the file themselves and copy/paste the contents into a designated sheet.

You can hook the 'Load' function up to any button, the same code is used, but you can pass parameters specifying the destination sheet of the loaded file contents. You can also make up your own parameters for customizing the load function. You can also change GetOpenFilename parameters to include your own specified file formats, perform special operations on your files after you load it. For instance, validate the file, ensuring that certain columns exist. There's much room for customization. Example-Auto-Import-Files.zip

Here's to getting around more of Excel's undesirable constructs!

--
Alain
5
10,892 Views

Comments (2)

Richie_SimonettiIT Operations
CERTIFIED EXPERT

Commented:
I must say:
hats off for you!!  it saved not my day but my week!!
Best regards
CERTIFIED EXPERT
Top Expert 2010

Commented:
Alain,

Nice article :)  BTW, you might be interested in a current question about getting Excel to open CSV files without changing the contents.

The ugly but serviceable code I came up with:

Sub OpenCSVAllText()
    
    Dim FilesToOpen As Variant
    Dim FileCounter As Long
    Dim wb As Workbook
    Dim FilePath As String
    Dim NumColumns As Long
    Dim ColumnInfo() As Variant
    Dim ColumnCounter As Long
    Dim fso As Object
    Dim FilePathTxt As String
    
    FilesToOpen = Application.GetOpenFilename("CSV files (*.csv), *.csv", , "Select files to open", , True)
    If Not IsArray(FilesToOpen) Then
        MsgBox "No files selected, aborting", vbCritical, "No soup for you!"
        Exit Sub
    End If
    
    With Application
        .ScreenUpdating = False
        .DisplayAlerts = False
    End With
    
    Set fso = CreateObject("Scripting.FileSystemObject")
    
    For FileCounter = LBound(FilesToOpen) To UBound(FilesToOpen)
        FilePath = FilesToOpen(FileCounter)
        Set wb = Workbooks.Open(FilePath)
        NumColumns = Cells(1, Columns.Count).End(xlToLeft).Column
        wb.Close False
        ReDim ColumnInfo(0 To NumColumns - 1) As Variant
        For ColumnCounter = 1 To NumColumns
            ColumnInfo(ColumnCounter - 1) = Array(ColumnCounter, 2)
        Next
        FilePathTxt = Left(FilePath, Len(FilePath) - 3) & "txt"
        If Dir(FilePathTxt) <> "" Then Kill FilePathTxt
        fso.MoveFile FilePath, FilePathTxt
        Workbooks.OpenText Filename:=FilePathTxt, Origin:=xlWindows, StartRow:=1, DataType:=xlDelimited, _
            TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=False, Tab:=False, Semicolon:=False, _
            Comma:=True, Space:=False, Other:=False, FieldInfo:=ColumnInfo, TrailingMinusNumbers:=True
        ActiveWorkbook.SaveAs FilePath, xlCSV
        Kill FilePathTxt
    Next
    
    With Application
        .ScreenUpdating = True
        .DisplayAlerts = True
    End With
    
    Set fso = Nothing
    
    MsgBox "Done"
    
End Sub

Open in new window


Basically, I:
Open the CSV file as is just to get a column count
Create a new file with a TXT extension
Open the text file
Save the TXT file as a CSV, over the original file

That first step, to get the column count, allows me to define how big the array with the column definitions needs to be:

        NumColumns = Cells(1, Columns.Count).End(xlToLeft).Column
        wb.Close False
        ReDim ColumnInfo(0 To NumColumns - 1) As Variant
        For ColumnCounter = 1 To NumColumns
            ColumnInfo(ColumnCounter - 1) = Array(ColumnCounter, 2)
        Next

Open in new window


Cheers, and looking forward to your next article,

Patrick

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.