Link to home
Start Free TrialLog in
Avatar of Bryce Bassett
Bryce BassettFlag for United States of America

asked on

Using VBA to remove BOM from UTF-8 CSV file, so special characters can be read into Word using ADODB

I've written a pretty complex Word global template (.dotm) using VBA.  One function auto-creates a Word document based on a CSV file output from another system.  The developers of that system are saving the CSV using "UTF-8 with BOM" encoding. I use this  ADODB method to open the CSV so I can import it into an array.  

Set objConnection = New ADODB.Connection

objConnection.ConnectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & CSVfolder & ";Extended Properties='text;HDR=NO;FMT=Delimited'"

objConnection.Open

Set objRecSet = New ADODB.Recordset

objRecSet.ActiveConnection = objConnection

objRecSet.source = "SELECT * FROM [" & CSVfile & "];" 

objRecSet.Open , , adOpenStatic

Open in new window

This works great.  However, the presence of the Byte Order Marker at the beginning of the file is causing international special characters to be read in/displayed incorrectly when used in string variables.  If I first use Excel to open and then resave the CSV file as a regular comma delimited CSV file, not UTF-8, then the special characters import and display correctly. But I'm trying to avoid that step.

Is there something I can add to my connection string, or a VBA method I can use to remove the BOM from the file before I import it?
Avatar of aikimark
aikimark
Flag of United States of America image

Your VBA code can read and write files, using the scripting.filesystemobject.  No need to involve Excel I/O for this.
Avatar of Bryce Bassett

ASKER

Thanks. Can you please point me to some examples of how I would read from this file using fso instead of adodb?
You would still use ADODB to read the file as as CSV.

I'm talking about using FSO to preprocess the file to remove the BOM bytes.

Have you never used FSO to read or write a file?
Thanks for clarifying.  I've only used FSO to list, copy and move files and folder, never to read or write a file.

I'm guessing I would use the FSO.OpenTextFile method, but I'm not clear on how to strip out the BOM and re-write the file without it.  Can you suggest a web page, or provide a VBA code snippet to accomplish this?

Thanks.
ASKER CERTIFIED SOLUTION
Avatar of aikimark
aikimark
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks.  I appreciate your help!
aikimark,

The code you suggested works well for stripping out the BOM.  However, special characters are still coming in incorrectly. So I guess the BOM was not the problem.  

There appear to be some control characters before each special character.  I can work around the problem by launching a hidden instance of Excel, opening the file, Saving As xlcsv format to a temporary location, then using the temp file for my import.  But is there a cleaner solution?

Thanks.
What other characters?

Please post a representative sample file.
Here's an example of what's in a test file. (shown in Hex in Notepad++)

The name should be M. Andrée, but the é is being read as two characters

User generated image
The UTF-8 encoding of LATIN SMALL LETTER E WITH ACUTE is &hc3a9

https://www.fileformat.info/info/unicode/char/00e9/index.htm

This is what you said you needed.